Neural abstractive summarization: improvements at the sequence-level

Automatic text summarization has made a fantastic leap forward in the last five to ten years, fueled by the rise of deep learning systems. Summarization at large consists in compressing an input text or series of texts (such as a scientific paper, news articles, etc) into a more concise form contain...

Full description

Saved in:

Bibliographic Details
Main Author:	Ravaut, Mathieu
Other Authors:	Sun Aixin
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Natural language processing Abstractive summarization Deep learning Large language models
Online Access:	https://hdl.handle.net/10356/181414
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-181414
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Natural language processing Abstractive summarization Deep learning Large language models
spellingShingle	Computer and Information Science Natural language processing Abstractive summarization Deep learning Large language models Ravaut, Mathieu Neural abstractive summarization: improvements at the sequence-level
description	Automatic text summarization has made a fantastic leap forward in the last five to ten years, fueled by the rise of deep learning systems. Summarization at large consists in compressing an input text or series of texts (such as a scientific paper, news articles, etc) into a more concise form containing all key elements. The task can be divided into extractive summarization, where the system must produce an output solely made of input contents, and abstractive summarization, where the model is free to generate new text, much like a human would do. In this thesis, we only consider the latter, more challenging abstractive summarization use case. Abstractive summarization was stuck in faltering steps until the joint rise of encoder-decoder sequence-to-sequence neural networks, and large-scale annotated datasets, starting from the mid 2010s. With enough data, pre-trained Transformer-based sequence-to-sequence models can be fine-tuned and achieve strong performance on abstractive summarization benchmarks. The standard way to fine-tune is through Maximum Likelihood Estimation (MLE), with token-level cross-entropy loss: at each timestep, the model is forced to predict the exact corresponding token from the (unique) ground-truth summary. This standard setup of MLE with token-level cross-entropy loss, coupled with teacher forcing, is not ideal for several reasons. First, there may be several good summaries instead of a unique one, yet these alternatives are ignored. Second, the optimization is at the token-level, while metrics evaluate the whole summary. Third, teacher forcing differs a lot from the inference setup in which auto-regressive decoding is performed, which may lead to error propagation. In this thesis, we focus on a nascent line of research consisting in complementing the token-level perspective with a sequence-level approach. Through another round of training or second-stage model, the sequence-level approach can enrich the summarization system and align the output summary closer to the expected target. Our first work builds a supervised second-stage abstractive summarization model performing re-ranking. Given a base model which is already fine-tuned for summarization, we sample several summary candidates and our model SummaReranker trains another model (a RoBERTa encoder) to select the best one. Since summarization evaluation is not straightforward, we build a multi-task approach powered by a mixture-of-experts, where the system can optimize for several summarization metrics at once. By construction, the re-ranker above is bounded by the performance of the best existing summary candidate. To break this ceiling, we propose a new paradigm: our second work named SummaFusion combines together the summary candidates to produce a new, abstractive second-stage one. It also conditions on the source document, giving the model a second chance at modeling salient content. We show that this model is particularly effective at fixing low-quality first-stage systems. In our next work we tackle an important desirable property of summarization systems: controllability. We propose a new model named PromptSum, which steers output summaries given an input list of keywords, such as named entities. Our system also scales to adapting to several domains through the usage of parameter-efficient fine-tuning with soft prompts, limiting the storage footprint. Our previous approaches assume access to a supervised setup, in order to fine-tune the base model, and score the sampled summary candidates against the target. In the fourth work, we extend the re-ranking approach to the more challenging unsupervised setup. Our SummScore model compares the summary candidates and the source, through metrics such as embeddings similarities, then combines the score into a unique score where averaging weights are learned by optimizing against pseudo-summaries derived from the source, such as Lead-3 sentences. Lastly, we turn our focus to summarization with Large Language Models (LLMs). LLMs have enabled great progress in summarization, producing grammatical, fluent and relevant outputs in zero-shot, without dataset-specific fine-tuning. However, they still suffer from important limitations, notably a limited context window length and the more subtle issue of unevenly using their context. We study this latter problem in depth, referred to as the middle-curse, on the task of abstractive summarization, and find that it affects performance sharply.
author2	Sun Aixin
author_facet	Sun Aixin Ravaut, Mathieu
format	Thesis-Doctor of Philosophy
author	Ravaut, Mathieu
author_sort	Ravaut, Mathieu
title	Neural abstractive summarization: improvements at the sequence-level
title_short	Neural abstractive summarization: improvements at the sequence-level
title_full	Neural abstractive summarization: improvements at the sequence-level
title_fullStr	Neural abstractive summarization: improvements at the sequence-level
title_full_unstemmed	Neural abstractive summarization: improvements at the sequence-level
title_sort	neural abstractive summarization: improvements at the sequence-level
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/181414
_version_	1821237164338315264
spelling	sg-ntu-dr.10356-1814142025-01-02T10:18:25Z Neural abstractive summarization: improvements at the sequence-level Ravaut, Mathieu Sun Aixin College of Computing and Data Science AXSun@ntu.edu.sg Computer and Information Science Natural language processing Abstractive summarization Deep learning Large language models Automatic text summarization has made a fantastic leap forward in the last five to ten years, fueled by the rise of deep learning systems. Summarization at large consists in compressing an input text or series of texts (such as a scientific paper, news articles, etc) into a more concise form containing all key elements. The task can be divided into extractive summarization, where the system must produce an output solely made of input contents, and abstractive summarization, where the model is free to generate new text, much like a human would do. In this thesis, we only consider the latter, more challenging abstractive summarization use case. Abstractive summarization was stuck in faltering steps until the joint rise of encoder-decoder sequence-to-sequence neural networks, and large-scale annotated datasets, starting from the mid 2010s. With enough data, pre-trained Transformer-based sequence-to-sequence models can be fine-tuned and achieve strong performance on abstractive summarization benchmarks. The standard way to fine-tune is through Maximum Likelihood Estimation (MLE), with token-level cross-entropy loss: at each timestep, the model is forced to predict the exact corresponding token from the (unique) ground-truth summary. This standard setup of MLE with token-level cross-entropy loss, coupled with teacher forcing, is not ideal for several reasons. First, there may be several good summaries instead of a unique one, yet these alternatives are ignored. Second, the optimization is at the token-level, while metrics evaluate the whole summary. Third, teacher forcing differs a lot from the inference setup in which auto-regressive decoding is performed, which may lead to error propagation. In this thesis, we focus on a nascent line of research consisting in complementing the token-level perspective with a sequence-level approach. Through another round of training or second-stage model, the sequence-level approach can enrich the summarization system and align the output summary closer to the expected target. Our first work builds a supervised second-stage abstractive summarization model performing re-ranking. Given a base model which is already fine-tuned for summarization, we sample several summary candidates and our model SummaReranker trains another model (a RoBERTa encoder) to select the best one. Since summarization evaluation is not straightforward, we build a multi-task approach powered by a mixture-of-experts, where the system can optimize for several summarization metrics at once. By construction, the re-ranker above is bounded by the performance of the best existing summary candidate. To break this ceiling, we propose a new paradigm: our second work named SummaFusion combines together the summary candidates to produce a new, abstractive second-stage one. It also conditions on the source document, giving the model a second chance at modeling salient content. We show that this model is particularly effective at fixing low-quality first-stage systems. In our next work we tackle an important desirable property of summarization systems: controllability. We propose a new model named PromptSum, which steers output summaries given an input list of keywords, such as named entities. Our system also scales to adapting to several domains through the usage of parameter-efficient fine-tuning with soft prompts, limiting the storage footprint. Our previous approaches assume access to a supervised setup, in order to fine-tune the base model, and score the sampled summary candidates against the target. In the fourth work, we extend the re-ranking approach to the more challenging unsupervised setup. Our SummScore model compares the summary candidates and the source, through metrics such as embeddings similarities, then combines the score into a unique score where averaging weights are learned by optimizing against pseudo-summaries derived from the source, such as Lead-3 sentences. Lastly, we turn our focus to summarization with Large Language Models (LLMs). LLMs have enabled great progress in summarization, producing grammatical, fluent and relevant outputs in zero-shot, without dataset-specific fine-tuning. However, they still suffer from important limitations, notably a limited context window length and the more subtle issue of unevenly using their context. We study this latter problem in depth, referred to as the middle-curse, on the task of abstractive summarization, and find that it affects performance sharply. Doctor of Philosophy 2024-12-02T03:06:27Z 2024-12-02T03:06:27Z 2024 Thesis-Doctor of Philosophy Ravaut, M. (2024). Neural abstractive summarization: improvements at the sequence-level. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181414 https://hdl.handle.net/10356/181414 10.32657/10356/181414 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Neural abstractive summarization: improvements at the sequence-level

Similar Items