Ahmed El-Kishky

PhD Student in Computer Science

Name Ahmed El-Kishky
Occupation PhD student in the Data Mining Group at UIUC
Education BS Computer Science and BS Mathematics, The University of Tulsa. May 2013
Campus Address Thomas Siebel Center for Computer Science #1119 201 N. Goodwin Ave. Urbana, IL 61801
Research Interest Data Mining, Machine Learning, Computational Statistics, Latent Variable Models
Email elkishk2 AT illinois.edu
Ahmed El-Kishky

I am currently a PhD student in computer science at The University of Illinois at Urbana Champaign, and, before that, I obtained my bachelor's degree with a double major in Computer Science and Mathematics at the University of Tulsa. Since starting my PhD, I have been working under the supervision of Profesor Jiawei Han, in the Data Mining Group. Since 2013, I have been fortunate enough to be supported by the National Science Foundation Graduate Research Fellowship and starting Fall 2015, I'll be supported by the NDSEG fellowship.

Generally, my areas of interest lie in data mining and machine learning. More specifically I work on problems related to text mining (topic modeling, phrase mining, entity extraction) and graph mining (the classical problems of clustering, prediction and classification in networks). I also have an interest in computational statistics, particularly in latent variable models.

Download PDF CV

Publications

Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R. Voss, Jiawei Han, "Scalable Topical Phrase Mining from Text Corpora", PVLDB Vol. 8 (Also, Proc. 2015 Int. Conf. on Very Large Data Bases (VLDB'15), Kohala Coast, Hawaii, Sept. 2015)  [Paper]

Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R. Voss, Heng Ji, Jiawei Han. “ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering”, 2015 ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD’15) Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, Jiawei Han, “Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks”. 2015 ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD’15)

Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, Jiawei Han, “Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks”. 2015 ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD’15)

F. Tao, J. Han, H. Ji, G. Brova, C. Wang, A. El-Kishky, J. Liu, X. Ren and Y. Sun, NewsNetExplorer: Automatic Construction and Exploration of News Information Networks (system demo) Proc. of 2014 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’14), Snowbird, UT, June 2014

D. Arbour, J. Atwood, A. El-Kishky, and D. Jensen. (2013) Agglomerative clustering of bagged data using joint distributions. In Papers from the International Conference on Machine Learning Workshop on Structured Learning: Inferring Graphs from Structured and Unstructured Inputs. [Paper]

El-Kishky, Ahmed. "Assessing entropy and fractal dimensions as discriminants of seizures in EEG time series." Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on. IEEE, 2012. [Paper]

Tutorials

Jiawei Han, Chi Wang, Ahmed El-Kishky, "Bringing Structure to Text: Mining Phrases, Entities, Topics, and Hierarchies", 2014 ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD'14), New York City, NY., August 2014.  [Slides]

Work Experience

Data Mining Group: Graduate Researcher (08/13 - Present)

My research projects focus on clustering, prediction, and classification, in large text rich data as well as latent variable dimensionality reduction of large scale networks.

Facebook Inc: PhD Sofware Engineering Intern (Fall 2015)

Performed machine learning research and software engineering in the Applied Machine Learning team.

Google Research: Research Intern (Spring 2016)

Performed machine learning research.

Palantir Technologies: Machine Learning Intern (Summer 2015)

Applied machine learning to large-scale customer data.

Microsoft Research: Research Intern (Spring 2015)

Performed data mining research within the DMX group.

Rocket Fuel: Software Engineering / Data Science Intern (Summer 2013)

Performed data science and software engineering work.

U. of Massachusetts Amherst: NSF REU Researcher (Summer 2012)

Performed research in clustering of heterogenous data and causal discovery from relational data while working in the Knowledge Discovery Laboratory (KDL) under the supervision of Dr. David Jensen. (Amherst, MA)

U. of Notre Dame: NSF REU Researcher (Summer 2011)

Performed research in the use of mobile sensor networks and high-performance computing to provide more accurate real-time simulation and prediction in computational fluid dynamics while working under the supervision of Dr. Christian Poellabauer and Dr. Karel Matous. (Notre Dame, IN)

Honors and Awards

National Defense Science and Engineering (NDSEG) Fellowship (2015 - 2018)

National Science Foundation Graduate Research Fellowship (GRFP) (2013 - 2018)

University of Tulsa (Full-ride) Presidential Scholarship (2009 - 2013)

Donald W. Reynolds Governor's Cup Competition: Team Leader, State Champions - $22,000 prize (2013)

Donald W. Reynolds Governor's Cup Competition: Team Leader, 3rd Place - $6,0000 prize (2012)

Donald W. Reynolds Governor's Cup Competition: Team Leader, Top-6 - $1,500 prize (2011)

National Merit Scholar (2009)

Salutatorian: Robert E. Lee High School Class of 2009 (Rank 2/~650)

Ahmed El-Kishky

Champaign, Illinois

Web: http://web.engr.illinois.edu/~elkishk2/
E-mail: elkishk2 AT illinois.edu

Data Mining Research Group: http://dm1.cs.uiuc.edu/people.html

Code

All code comes with shell scripts / batch files for Mac/Linux/Windows compatibility and ease of use.

ToPMine Code: Scalable Topical Phrase Mining from Text Corpora (VLDB'15). Source code for ToPMine framework. ToPMine provides a two-step framework for first transforming a corpus from a 'bag-of-words' to a 'bag-of-phrases'. Step two involves efficient topic modeling to the enhanced 'bag-of-phrases' corpus. Both steps of the framework can be run independently with this code.

Datasets

Titles Five CS Topics: This dataset consists of titles taken from conferences whose main focus lies broadly in the CS areas of Databases, Data Mining, Machine Learning, and Information Retrieval. (44K titles)

AP News Dataset (1989): This large dataset contains Associated Press news articles published in 1989. (106K Full articles)

DBLP Abstracts: This dataset consists abstracts of papers published on DBLP. (529K full abstracts)