Mining structured knowledge from massive unstructured text data is a key challenge in data science. In this talk, I will discuss my proposed framework, AutoNet, that transforms unstructured text data into structured heterogeneous information networks, on which actionable knowledge can be further uncovered flexibly and effectively. AutoNet is a data-driven approach using distant supervision instead of human curation and labeling. It consists of four essential steps: (1) quality phrase mining; (2) entity recognition and typing; (3) relation extraction; and (4) taxonomy construction. Along this line, I have developed a number of state-of-the-art distantly-supervised/unsupervised methods and published them in top conferences and journals. Specifically, I will present my work about phrase mining, entity recognition, and taxonomy construction in details, while touching the other work slightly. Finally, I will summarize the AutoNet framework with a demo video and conclude by discussing future work collaborating with other disciplines.
Jingbo Shang is a Ph.D. candidate in Department of Computer Science, the University of Illinois at Urbana-Champaign. He received his B.E. from the Computer Science Department, Shanghai Jiao Tong University, China. His research focuses on mining and constructing structured knowledge from massive text corpora with minimum human effort. His research has been recognized by many prestigious awards, including Computer Science Excellence Scholarship from CS@Illinois, Grand Prize of Yelp Dataset Challenge in 2015, and Google Ph.D. Fellowship in Structured Data and Database Management in 2017, and C.W. Gear Outstanding Graduate Award in 2018.