Neural abstractive summarization: improvements at the sequence-level
Automatic text summarization has made a fantastic leap forward in the last five to ten years, fueled by the rise of deep learning systems. Summarization at large consists in compressing an input text or series of texts (such as a scientific paper, news articles, etc) into a more concise form contain...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181414 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-181414 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Natural language processing Abstractive summarization Deep learning Large language models |
spellingShingle |
Computer and Information Science Natural language processing Abstractive summarization Deep learning Large language models Ravaut, Mathieu Neural abstractive summarization: improvements at the sequence-level |
description |
Automatic text summarization has made a fantastic leap forward in the last five to ten years, fueled by the rise of deep learning systems. Summarization at large consists in compressing an input text or series of texts (such as a scientific paper, news articles, etc) into a more concise form containing all key elements. The task can be divided into extractive summarization, where the system must produce an output solely made of input contents, and abstractive summarization, where the model is free to generate new text, much like a human would do. In this thesis, we only consider the latter, more challenging abstractive summarization use case. Abstractive summarization was stuck in faltering steps until the joint rise of encoder-decoder sequence-to-sequence neural networks, and large-scale annotated datasets, starting from the mid 2010s. With enough data, pre-trained Transformer-based sequence-to-sequence models can be fine-tuned and achieve strong performance on abstractive summarization benchmarks. The standard way to fine-tune is through Maximum Likelihood Estimation (MLE), with token-level cross-entropy loss: at each timestep, the model is forced to predict the exact corresponding token from the (unique) ground-truth summary. This standard setup of MLE with token-level cross-entropy loss, coupled with teacher forcing, is not ideal for several reasons. First, there may be several good summaries instead of a unique one, yet these alternatives are ignored. Second, the optimization is at the token-level, while metrics evaluate the whole summary. Third, teacher forcing differs a lot from the inference setup in which auto-regressive decoding is performed, which may lead to error propagation. In this thesis, we focus on a nascent line of research consisting in complementing the token-level perspective with a sequence-level approach. Through another round of training or second-stage model, the sequence-level approach can enrich the summarization system and align the output summary closer to the expected target. Our first work builds a supervised second-stage abstractive summarization model performing re-ranking. Given a base model which is already fine-tuned for summarization, we sample several summary candidates and our model SummaReranker trains another model (a RoBERTa encoder) to select the best one. Since summarization evaluation is not straightforward, we build a multi-task approach powered by a mixture-of-experts, where the system can optimize for several summarization metrics at once. By construction, the re-ranker above is bounded by the performance of the best existing summary candidate. To break this ceiling, we propose a new paradigm: our second work named SummaFusion combines together the summary candidates to produce a new, abstractive second-stage one. It also conditions on the source document, giving the model a second chance at modeling salient content. We show that this model is particularly effective at fixing low-quality first-stage systems. In our next work we tackle an important desirable property of summarization systems: controllability. We propose a new model named PromptSum, which steers output summaries given an input list of keywords, such as named entities. Our system also scales to adapting to several domains through the usage of parameter-efficient fine-tuning with soft prompts, limiting the storage footprint. Our previous approaches assume access to a supervised setup, in order to fine-tune the base model, and score the sampled summary candidates against the target. In the fourth work, we extend the re-ranking approach to the more challenging unsupervised setup. Our SummScore model compares the summary candidates and the source, through metrics such as embeddings similarities, then combines the score into a unique score where averaging weights are learned by optimizing against pseudo-summaries derived from the source, such as Lead-3 sentences. Lastly, we turn our focus to summarization with Large Language Models (LLMs). LLMs have enabled great progress in summarization, producing grammatical, fluent and relevant outputs in zero-shot, without dataset-specific fine-tuning. However, they still suffer from important limitations, notably a limited context window length and the more subtle issue of unevenly using their context. We study this latter problem in depth, referred to as the middle-curse, on the task of abstractive summarization, and find that it affects performance sharply. |
author2 |
Sun Aixin |
author_facet |
Sun Aixin Ravaut, Mathieu |
format |
Thesis-Doctor of Philosophy |
author |
Ravaut, Mathieu |
author_sort |
Ravaut, Mathieu |
title |
Neural abstractive summarization: improvements at the sequence-level |
title_short |
Neural abstractive summarization: improvements at the sequence-level |
title_full |
Neural abstractive summarization: improvements at the sequence-level |
title_fullStr |
Neural abstractive summarization: improvements at the sequence-level |
title_full_unstemmed |
Neural abstractive summarization: improvements at the sequence-level |
title_sort |
neural abstractive summarization: improvements at the sequence-level |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/181414 |
_version_ |
1819113060728569856 |
spelling |
sg-ntu-dr.10356-1814142024-12-02T03:06:27Z Neural abstractive summarization: improvements at the sequence-level Ravaut, Mathieu Sun Aixin College of Computing and Data Science AXSun@ntu.edu.sg Computer and Information Science Natural language processing Abstractive summarization Deep learning Large language models Automatic text summarization has made a fantastic leap forward in the last five to ten years, fueled by the rise of deep learning systems. Summarization at large consists in compressing an input text or series of texts (such as a scientific paper, news articles, etc) into a more concise form containing all key elements. The task can be divided into extractive summarization, where the system must produce an output solely made of input contents, and abstractive summarization, where the model is free to generate new text, much like a human would do. In this thesis, we only consider the latter, more challenging abstractive summarization use case. Abstractive summarization was stuck in faltering steps until the joint rise of encoder-decoder sequence-to-sequence neural networks, and large-scale annotated datasets, starting from the mid 2010s. With enough data, pre-trained Transformer-based sequence-to-sequence models can be fine-tuned and achieve strong performance on abstractive summarization benchmarks. The standard way to fine-tune is through Maximum Likelihood Estimation (MLE), with token-level cross-entropy loss: at each timestep, the model is forced to predict the exact corresponding token from the (unique) ground-truth summary. This standard setup of MLE with token-level cross-entropy loss, coupled with teacher forcing, is not ideal for several reasons. First, there may be several good summaries instead of a unique one, yet these alternatives are ignored. Second, the optimization is at the token-level, while metrics evaluate the whole summary. Third, teacher forcing differs a lot from the inference setup in which auto-regressive decoding is performed, which may lead to error propagation. In this thesis, we focus on a nascent line of research consisting in complementing the token-level perspective with a sequence-level approach. Through another round of training or second-stage model, the sequence-level approach can enrich the summarization system and align the output summary closer to the expected target. Our first work builds a supervised second-stage abstractive summarization model performing re-ranking. Given a base model which is already fine-tuned for summarization, we sample several summary candidates and our model SummaReranker trains another model (a RoBERTa encoder) to select the best one. Since summarization evaluation is not straightforward, we build a multi-task approach powered by a mixture-of-experts, where the system can optimize for several summarization metrics at once. By construction, the re-ranker above is bounded by the performance of the best existing summary candidate. To break this ceiling, we propose a new paradigm: our second work named SummaFusion combines together the summary candidates to produce a new, abstractive second-stage one. It also conditions on the source document, giving the model a second chance at modeling salient content. We show that this model is particularly effective at fixing low-quality first-stage systems. In our next work we tackle an important desirable property of summarization systems: controllability. We propose a new model named PromptSum, which steers output summaries given an input list of keywords, such as named entities. Our system also scales to adapting to several domains through the usage of parameter-efficient fine-tuning with soft prompts, limiting the storage footprint. Our previous approaches assume access to a supervised setup, in order to fine-tune the base model, and score the sampled summary candidates against the target. In the fourth work, we extend the re-ranking approach to the more challenging unsupervised setup. Our SummScore model compares the summary candidates and the source, through metrics such as embeddings similarities, then combines the score into a unique score where averaging weights are learned by optimizing against pseudo-summaries derived from the source, such as Lead-3 sentences. Lastly, we turn our focus to summarization with Large Language Models (LLMs). LLMs have enabled great progress in summarization, producing grammatical, fluent and relevant outputs in zero-shot, without dataset-specific fine-tuning. However, they still suffer from important limitations, notably a limited context window length and the more subtle issue of unevenly using their context. We study this latter problem in depth, referred to as the middle-curse, on the task of abstractive summarization, and find that it affects performance sharply. Doctor of Philosophy 2024-12-02T03:06:27Z 2024-12-02T03:06:27Z 2024 Thesis-Doctor of Philosophy Ravaut, M. (2024). Neural abstractive summarization: improvements at the sequence-level. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181414 https://hdl.handle.net/10356/181414 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |