Advancing neural text generation
The current sequence-to-sequence with attention models, despite being successful, are inherently limited in encompassing the most appropriate inductive bias for the generation tasks, which gives rise to varied modifications to the framework to better model the task. In particular, content select...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/147963 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-147963 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1479632021-04-20T08:14:39Z Advancing neural text generation Han, Simeng Joty Shafiq Rayhan School of Computer Science and Engineering srjoty@ntu.edu.sg Engineering::Computer science and engineering The current sequence-to-sequence with attention models, despite being successful, are inherently limited in encompassing the most appropriate inductive bias for the generation tasks, which gives rise to varied modifications to the framework to better model the task. In particular, content selection is an important aspect in summarization where one salient problem is the tendency of the models to repeat generating the same tokens or sequences over and over. Submodularity is desirable for a variety of objectives in content selection where the current neural encoder-decoder framework is inadequate. However, it has so far not been explored in the neural encoder-decoder system for text generation. The greedy algorithm approximating the solution to the submodular maximization problem is not suited to attention score optimization in auto-regressive generation. Therefore instead of following how submodular function has been widely used, we propose a simplified yet principled solution. The resulting attention module offers an architecturally simple and empirically effective method to improve the coverage of neural text generation. We run experiments on three directed text generation tasks with different levels of recovering rate, across two modalities, three different neural model architectures and two training strategy variations. The results and analyses demonstrate that our method generalizes well across these settings, produces texts of good quality and outperforms state-of-the-art baselines. In this project, we also explore low resource text generation, specifically, zero-shot and few-shot text summarization. Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a general method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner which makes use of characteristics of the target dataset such as the length and abstractiveness of the desired summaries. We achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate the effectiveness of our approach on three additional, diverse datasets. The models fine-tuned in this unsupervised manner are more robust to noisy data and also achieve better few-shot performance using 10 and 100 training examples. We perform ablation studies on the effect of the components of our unsupervised fine-tuning data and analyze the performance of these models in few-shot scenarios along with data augmentation techniques using both automatic and human evaluation. The work on zero and few-shot text summarization has been accepted by The 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). The investigation of text generation with submodularity will be submitted to The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). Bachelor of Engineering (Computer Science) 2021-04-20T08:14:39Z 2021-04-20T08:14:39Z 2021 Final Year Project (FYP) Han, S. (2021). Advancing neural text generation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/147963 https://hdl.handle.net/10356/147963 en SCSE20-0102 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Han, Simeng Advancing neural text generation |
description |
The current sequence-to-sequence with attention models, despite being successful, are inherently limited in encompassing the most appropriate inductive bias for the generation tasks, which gives rise to varied modifications to the framework to better model the task.
In particular, content selection is an important aspect in summarization where one salient problem is the tendency of the models to repeat generating the same tokens or sequences over and over. Submodularity is desirable for a variety of objectives in content selection where the current neural encoder-decoder framework is inadequate. However, it has so far not been explored in the neural encoder-decoder system for text generation. The greedy algorithm approximating the solution to the submodular maximization problem is not suited to attention score optimization in auto-regressive generation. Therefore instead of following how submodular function has been widely used, we propose a simplified yet principled solution. The resulting attention module offers an architecturally simple and empirically effective method to improve the coverage of neural text generation. We run experiments on three directed text generation tasks with different levels of recovering rate, across two modalities, three different neural model architectures and two training strategy variations. The results and analyses demonstrate that our method generalizes well across these settings, produces texts of good quality and outperforms state-of-the-art baselines.
In this project, we also explore low resource text generation, specifically, zero-shot and few-shot text summarization. Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a general method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner which makes use of characteristics of the target dataset such as the length and abstractiveness of the desired summaries. We achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate the effectiveness of our approach on three additional, diverse datasets. The models fine-tuned in this unsupervised manner are more robust to noisy data and also achieve better few-shot performance using 10 and 100 training examples. We perform ablation studies on the effect of the components of our unsupervised fine-tuning data and analyze the performance of these models in few-shot scenarios along with data augmentation techniques using both automatic and human evaluation.
The work on zero and few-shot text summarization has been accepted by The 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). The investigation of text generation with submodularity will be submitted to The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). |
author2 |
Joty Shafiq Rayhan |
author_facet |
Joty Shafiq Rayhan Han, Simeng |
format |
Final Year Project |
author |
Han, Simeng |
author_sort |
Han, Simeng |
title |
Advancing neural text generation |
title_short |
Advancing neural text generation |
title_full |
Advancing neural text generation |
title_fullStr |
Advancing neural text generation |
title_full_unstemmed |
Advancing neural text generation |
title_sort |
advancing neural text generation |
publisher |
Nanyang Technological University |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/147963 |
_version_ |
1698713738409934848 |