CS Events
Qualifying ExamContrastive Self-Supervised Learning and Deep Pre-trained Language Models for Entity Resolution |
|
||
Tuesday, December 06, 2022, 10:30am |
|||
Speaker: Runhui Wang
Location : Virtual-The zoom link is https://rutgers.zoom.us/j/93769571337?pwd=NC9nR3RVeDJzcHpUZHl1ZWExK0Jndz09
Committee:
Dr. Yongfeng Zhang (advisor)
Dr. Hao Wang
Dr. Dong Deng
Dr. Kostas Bekris (external)
Event Type: Qualifying Exam
Abstract: Entity Resolution (ER) is a field of study dedicated to finding items that belong to the same entity, and is an essential problem in NLP and data integration and preparation (DI&P). We propose Sudowoodo, a multi-purpose DI&P framework based on contrastive representation learning and deep pre-trained language models. Sudowoodo features a unified, matching-based problem definition capturing a wide range of DI&P tasks including Entity Resolution (ER) in data integration, error correction in data cleaning, semantic type detection in data discovery, and more. Contrastive learning enables Sudowoodo to learn similarity-aware data representations from a large corpus of data items (e.g., entity entries, table columns) without using any labels. The learned representations can later be either directly used or facilitate fine-tuning with only a few labels to support the ER task. Our experiment results show that Sudowoodo achieves multiple state-of-the-art results on different levels of supervision and outperforms previous best specialized blocking or matching solutions for ER. Sudowoodo also achieves promising results in data cleaning and column matching tasks showing its versatility in DI&P applications. For the blocking step of ER, we propose Neural Locality Sensitive Hashing Blocking (NLSHBlock), which is based on pre-trained language models and fine-tuned with a novel LSH-inspired loss function. NLSHBlock out-performs existing methods on a wide range of datasets.
: