CS Events

Qualifying Exam

Contrastive Self-Supervised Learning and Deep Pre-trained Language Models for Entity Resolution

 

Download as iCal file

Tuesday, December 06, 2022, 10:30am

 

Speaker: Runhui Wang

Location : Virtual-The zoom link is https://rutgers.zoom.us/j/93769571337?pwd=NC9nR3RVeDJzcHpUZHl1ZWExK0Jndz09

Committee

Dr. Yongfeng Zhang (advisor)

Dr. Hao Wang

Dr. Dong Deng

Dr. Kostas Bekris (external)

Event Type: Qualifying Exam

Abstract: Entity Resolution (ER) is a field of study dedicated to finding items that belong to the same entity, and is an essential problem in NLP and data integration and preparation (DI&P). We propose Sudowoodo, a multi-purpose DI&P framework based on contrastive representation learning and deep pre-trained language models. Sudowoodo features a unified, matching-based problem definition capturing a wide range of DI&P tasks including Entity Resolution (ER) in data integration, error correction in data cleaning, semantic type detection in data discovery, and more. Contrastive learning enables Sudowoodo to learn similarity-aware data representations from a large corpus of data items (e.g., entity entries, table columns) without using any labels. The learned representations can later be either directly used or facilitate fine-tuning with only a few labels to support the ER task. Our experiment results show that Sudowoodo achieves multiple state-of-the-art results on different levels of supervision and outperforms previous best specialized blocking or matching solutions for ER. Sudowoodo also achieves promising results in data cleaning and column matching tasks showing its versatility in DI&P applications. For the blocking step of ER, we propose Neural Locality Sensitive Hashing Blocking (NLSHBlock), which is based on pre-trained language models and fine-tuned with a novel LSH-inspired loss function. NLSHBlock out-performs existing methods on a wide range of datasets.