My research combines methods for text retrieval, extraction, machine learning and analytics (TREMA).
Currently, I am working on methods that automatically, and in a query-driven manner, retrieve materials from the Web and compose Wikipedia-like articles. Especially for information needs, where the user has very little prior knowledge about, the web search paradigm of 10 blue hyperlinks is not sufficient. Instead, I envision to provide a synthesis of the Web materials to give a comprehensive overview (TREC CAR).
My goal is to develop algorithm to find what users are looking for based on text content only. In contrast, most Web-search algorithms are based on interaction data such as query-log, click, or session information---information that is not available when searching private document collections. Consequently, we aim to maximize the utility of information retrieval models in combination with methods from natural language processing.
A particular emphasis of my work is to utilize information from structured knowledge bases such as Wikipedia, Freebase, or DBpedia together with text-based reasoning on general document and Web corpora (KG4IR). In my work on "Entity Query Feature Expansion" (SIGIR 2014), I demonstrate that significantly better search results are obtained when using entity linking and knowledge bases in the retrieval algorithm.
Ph.D., Computer Science, Max Planck Institute
SCIENCE & TECHNOLOGY/MATHEMATICS/COMPUTER SCIENCE
CS 696W: Independent Study
CS 753/853: Information Retrieval
CS 780/880: Top/Information Retrieval
CS 953: DS - Knowledge Graphs and Text
CS 980: Adv Top/Data Sci w/ KnowGraphs
CS 999: Doctoral Research
Dietz, L., Xiong, C., Dalton, J., & Meij, E. (2019). Special issue on knowledge graphs and semantics in text analysis and retrieval \textbf[Special Issue]. Information Retrieval Journal, 1-3.
Nanni, F., Dietz, L., & Ponzetto, S. P. (2018). Toward a computational history of universities: Evaluating text mining methods for interdisciplinarity detection from PhD dissertation abstracts. Digital Scholarship in the Humanities, 33(3), 612-620. doi:10.1093/llc/fqx062
Weiland, L., Hulpuş, I., Ponzetto, S. P., Effelsberg, W., & Dietz, L. (2018). Knowledge-rich image gist understanding beyond literal meaning. Data & Knowledge Engineering, 117, 114-132. doi:10.1016/j.datak.2018.07.006
Dietz, L., Xiong, C., & Meij, E. (2018). Overview of The First Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis (KG4IR). ACM SIGIR Forum, 51, 139-144.
Aliannejadi, M., Hasanain, M., Mao, J., Singh, J., Trippas, J. R., Zamani, H., & Dietz, L. (2018). ACM SIGIR Student Liaison Program. ACM SIGIR Forum, 51, 42-45.
Weiland, L., Ponzetto, S. P., Effelsberg, W., & Dietz, L. (2018). Understanding the Gist of Images-Ranking of Concepts for Multimedia Indexing. arXiv preprint arXiv:1809.08593.
Nanni, F., Ponzetto, S. P., & Dietz, L. (2018). Toward comprehensive event collections. International Journal on Digital Libraries, 1-15.
Nanni, F., Dietz, L., & Ponzetto, S. P. (2017). Data from the paper: Towards a Computational History of Universities: Evaluating Text Mining Methods for Interdisciplinarity Detection from Ph. D. Dissertation Abstracts. Digital Scholarship in the Humanities.
Nanni, F., Zhao, Y., Ponzetto, S. P., & Dietz, L. (2016). Enhancing domain-specific entity linking in DH. computational linguistics, 2, 67-88.
Nanni, F., Dietz, L., Faralli, S., Glavaš, G., & Ponzetto, S. P. (2016). Capturing interdisciplinarity in academic abstracts. D-lib magazine, 22, 9.