Although we now have vast amounts of data available to us on the Web and elsewhere, it is often not clear how to genuinely leverage the richness of all this data. For natural language text, we typically need new human-labeled training data for each new combination of language, style, and domain. A model trained on movie reviews, for instance, will not work as well when applied to news articles. In this talk, I argue that we will need to jointly draw on multiple very heterogeneous kinds of data to move towards the next level of intelligent information systems.
The talk will showcase several of our results on exploiting auxiliary kinds of data for additional supervision. This includes techniques to exploit patterns in large amounts of unlabeled text, techniques to exploit large knowledge graphs as background knowledge, as well as techniques for self-supervision.
Gerard de Melo is an Assistant Professor in the Department of Computer Science at Rutgers University, where he heads the Deep Data Lab. Over the years, he has published around 150 papers on natural language processing, data science, and AI, being awarded Best Paper awards at CIKM 2010, ICGL 2008, and the NAACL 2015 Workshop on Vector Space Modeling, in addition to the WWW 2011 Best Demonstration Award, an ACL
2014 Best Paper Honorable Mention, a nomination for Best Student Paper at ESWC 2015, and a nomination for Best Paper at the IEEE Conference on Semantic Computing 2018. Prior to joining Rutgers, he was a faculty member at Tsinghua University and a Post-Doctoral Research Scholar at ICSI/UC Berkeley. He received his doctoral degree at the Max Planck Institute for Informatics.