Inhalt des Dokuments
Word relatedness from Word Embedding in text analysis and information retrieval
LOCATION: TEL, Room 208 (2nd floor),
Ernst-Reuter-Platz 7, 10587 Berlin
Date/Time: 15.01.2018, 12:00-12:45
SPEAKER: Allan Hanbury (TU Wien)
Word Embedding approaches, such as word2vec, are
being increasingly used as the basis for a wide variety of text
analysis and information retrieval applications. In this talk, I
present some of the recent contributions to this area from my research
group. The first part of the talk analyses the similarity values
produced by work2vec, in particular to determine the range of
similarity values that is indicative of actual term relatedness. Based
on these results, uses of the similarity values in sentiment analysis
and information retrieval are presented. Finally, we discuss the
problem of topic shifting in information retrieval resulting from the
incorporation of word2vec term similarities, mainly due to the local
context of these similarities. A solution is presented that involves
combining the local context of word2vec with the global context
provided by Latent Semantic Indexing (LSI).
Allan Hanbury is Professor for Data Intelligence at the TU Wien,
Austria, and Faculty Member of the Complexity Science Hub. He is
coordinator of the Austrian ICT Lighthouse Project, Data Market
Austria, which is creating a Data-Services Ecosystem in Austria. He
was scientific coordinator of the EU-funded Khresmoi Integrated
Project on medical and health information search and analysis, and is
co-founder of contextflow, the spin-off company commercialising the
radiology image search technology developed in the Khresmoi project.
He also coordinated the EU-funded VISCERAL project on evaluation of
algorithms on big data, and the EU-funded KConnect project on
technology for analysing medical text.
His areas of research include Data Science, Information Retrieval, Semantic Analysis and Search, Information Retrieval Evaluation, Recommender Systems, Data Mining and Machine Learning. He is author or co-author of over 140 publications in refereed journals and refereed international conferences.