WWW 2016 Tutorial: Automatic Entity Recognition and Typing in Massive Text CorporaIn today's computerized and information-based society, we are soaked with vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in different kinds of text corpora (especially in massive, domain-specific text corpora). These methods can automatically identify token spans as entity mentions in text and label their types (eg, people, product, organization) in a scalable way. We demonstrate on real datasets including news articles and yelp reviews how these typed entities aid in knowledge discovery and management.
University of Illinois at Urbana Champaign1, Microsoft Research2
- Introduction to entity recognition and typing.
- Entity recognition: An overview and phrase-mining approaches
- Supervised and Semisupervised Entity Mention Detection
- Unsupervised Entity Mention Detection
- Weakly and Distantly Supervised Mention Detection
- Supervised Entity Typing
- Semi-supervised Entity Typing
- Entity linking for typing
- Weakly-supervised Entity Typing
- Distantly Supervised Entity Typing
- Xiang Ren*, Wenqi He*, Meng Qu, Heng Ji, Clare R. Voss, Jiawei Han. Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding,
- Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R Voss, and Jiawei Han, Scalable Topical Phrase Mining from Text Corpora, in Proceedings of the VLDB Endowment, vol. 8, no. 3, VLDB - Very Large Data Bases, August 2015.
- Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R. Voss, Heng Ji, and Jiawei Han, ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering, in Proc. 2015 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'15), ACM - Association for Computing Machinery, August 2015.
- Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han, Mining Quality Phrases from Massive Text Corpora, in 2015 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'15), ACM - Association for Computing Machinery, June 2015.
Xiang Ren, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research focuses on knowledge acquisition from text data and mining linked data. In 2016, he received a Google PhD Fellowship for his work in Structured Data and Database Managment. He is the recipient of C. L. and Jane W.-S. Liu Award and Yahoo!-DAIS Research Excellence Gold Award in 2015. He received Microsoft Young Fellowship from Microsoft Research Asia in 2012.
Ahmed El-Kishky, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research interests include mining large unstructured data, text mining, and network mining. He is the recipient of both the National Science Foundation Graduate Research Fellowship as well as National Defense Science and Engineering Fellowship.
Chi Wang (Ph.D. UIUC, 2014), researcher at Microsoft Research, Redmond, Washington. He has been researching into discovering knowledge from unstructured and linked data, such as topics, concepts, relations, communities and social influence. His book Mining Latent Entity Structures is published by Morgan Claypool Pub., 2015, in the series of Synthesis Lectures on Data Mining and Knowledge Discovery. He is a winner of Microsoft Research Graduate Research Fellowship.
Jiawei Han, Abel Bliss Professor, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research areas encompass data mining, data ware-housing, information network analysis, and database systems, with over 600 conference and journal publications. He is Fellow of ACM and Fellow of IEEE, and received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), and IEEE Computer Society W. Wallace McDowell Award (2009). His co-authored textbook "Data Mining: Concepts and Techniques", 3rd ed., (Morgan Kaufmann, 2011) has been adopted popularly world-wide.