direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

There is no English translation for this web page.

Abstractive Text Summarization with Neural Sequence-to-Sequence Models

Location:  TEL, Room Auditorium 3 (20th floor), Ernst-Reuter-Platz 7, 10587 Berlin  

Date/Time: 24.02.2020, 14:15-15:00 

SPEAKER: Dmitrii Aksenow (TU Berlin / DFKI)



Nowadays, we face a permanent increase in the amount of unstructured information in text form. That calls for methods of text summarization. In this thesis, we concentrate on the task of a single-document neural networks-based abstractive text summarization.
We did several major scientific contributions. First of all, we explored to what extent knowledge from the pre-trained language model can be beneficial for the task of abstractive summarization. To this end, we experimented with conditioning the encoder, the decoder and the generator of a Transformer-based neural model on the BERT language model. The BERT conditioning showed huge improvement when used in encoder and decoder but was not useful for generator conditioning. Then, to alleviate the BERT`s input size limitation we proposed a method of BERT-windowing.  It allows chunk-wise processing of texts longer than the 512 tokens and respectively extends the BERT applicability. We also explored how locality modeling,  i.,e. the explicit restriction of calculations to the local context, can affect the summarization ability of Transformer. This was done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder.  Our abstractive models were evaluated and compared with state-of-the-art models on the CNN/Daily Mail dataset using ROUGE scores. We additionally trained our models on the German SwissText dataset to demonstrate the suitability of our model to languages other than English. All our models outperformed the Transformer-based baseline and showed their superiority in manual qualitative analysis. As the BERT-based model showed better results than convolutional self-attention-based we decided to use it in the release version of our summarization system. Finally, we developed the extractive sentence-level summarization module to be able to handle significantly long documents that can not be efficiently processed by neural networks. This module is based on the TF-IDF sentence-level summarization but uses BERT`s next sentence prediction capability to increase the consistency of the result summaries. In our summarization system, it is used as the first step of the summarization process before applying the abstractive model.



Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions