direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Text-based Prediction of Popular Click Paths in Wikipedia

Location:  Zoom link (Please ask Saman Zadtootaghaj for access)  

Date/Time: 01.02.2021, 14:30-15:00 

SPEAKER:  Davi de Paula  (TU Berlin)

Abstract: The ever-growing amount of available content on the Internet highlights the importance of the identification of popular content. Recommender systems are widely used to filter and prioritize items to the users. In the context of digital texts, these systems extract features from the body text, as well as additional data, such as title, headers, and keywords. The goal of this thesis is to estimate the interestingness between Wikipedia articles. We define interestingness as the degree of interest a user has in the target document after or while reading the source document. To perform this task, we use the click-through rate between two articles as an approximation of the interestingness. We then extract the semantic meaning from the text content of the articles and use it to predict popularity of a target document given a seed document.

We explore the use of SMASH RNN, Doc2Vec, and Wikipedia2Vec models over a dataset containing the Wikipedia user navigation data for March 2020. An analysis of performance results of the models demonstrated that SMASH RNN achieved a NDCG@10 score of 0.4972 over the evaluation dataset, a leverage of 54.27% over Doc2Vec. We observed that the characteristics of the article, such as word count, sentence count, paragraph count and popularity can impact the ability of SMASH RNN to predict popular click paths. Furthermore, an analysis of our experiments indicate that the use of non-textual features, as it is observed in Wikipedia2Vec, can have a positive impact in the performance of the models.




Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe