Advancing neural text generation

The current sequence-to-sequence with attention models, despite being successful, are inherently limited in encompassing the most appropriate inductive bias for the generation tasks, which gives rise to varied modifications to the framework to better model the task. In particular, content select...

Full description

Saved in:
Bibliographic Details
Main Author: Han, Simeng
Other Authors: Joty Shafiq Rayhan
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/147963
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-147963
record_format dspace
spelling sg-ntu-dr.10356-1479632021-04-20T08:14:39Z Advancing neural text generation Han, Simeng Joty Shafiq Rayhan School of Computer Science and Engineering srjoty@ntu.edu.sg Engineering::Computer science and engineering The current sequence-to-sequence with attention models, despite being successful, are inherently limited in encompassing the most appropriate inductive bias for the generation tasks, which gives rise to varied modifications to the framework to better model the task. In particular, content selection is an important aspect in summarization where one salient problem is the tendency of the models to repeat generating the same tokens or sequences over and over. Submodularity is desirable for a variety of objectives in content selection where the current neural encoder-decoder framework is inadequate. However, it has so far not been explored in the neural encoder-decoder system for text generation. The greedy algorithm approximating the solution to the submodular maximization problem is not suited to attention score optimization in auto-regressive generation. Therefore instead of following how submodular function has been widely used, we propose a simplified yet principled solution. The resulting attention module offers an architecturally simple and empirically effective method to improve the coverage of neural text generation. We run experiments on three directed text generation tasks with different levels of recovering rate, across two modalities, three different neural model architectures and two training strategy variations. The results and analyses demonstrate that our method generalizes well across these settings, produces texts of good quality and outperforms state-of-the-art baselines. In this project, we also explore low resource text generation, specifically, zero-shot and few-shot text summarization. Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a general method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner which makes use of characteristics of the target dataset such as the length and abstractiveness of the desired summaries. We achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate the effectiveness of our approach on three additional, diverse datasets. The models fine-tuned in this unsupervised manner are more robust to noisy data and also achieve better few-shot performance using 10 and 100 training examples. We perform ablation studies on the effect of the components of our unsupervised fine-tuning data and analyze the performance of these models in few-shot scenarios along with data augmentation techniques using both automatic and human evaluation. The work on zero and few-shot text summarization has been accepted by The 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). The investigation of text generation with submodularity will be submitted to The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). Bachelor of Engineering (Computer Science) 2021-04-20T08:14:39Z 2021-04-20T08:14:39Z 2021 Final Year Project (FYP) Han, S. (2021). Advancing neural text generation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/147963 https://hdl.handle.net/10356/147963 en SCSE20-0102 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Han, Simeng
Advancing neural text generation
description The current sequence-to-sequence with attention models, despite being successful, are inherently limited in encompassing the most appropriate inductive bias for the generation tasks, which gives rise to varied modifications to the framework to better model the task. In particular, content selection is an important aspect in summarization where one salient problem is the tendency of the models to repeat generating the same tokens or sequences over and over. Submodularity is desirable for a variety of objectives in content selection where the current neural encoder-decoder framework is inadequate. However, it has so far not been explored in the neural encoder-decoder system for text generation. The greedy algorithm approximating the solution to the submodular maximization problem is not suited to attention score optimization in auto-regressive generation. Therefore instead of following how submodular function has been widely used, we propose a simplified yet principled solution. The resulting attention module offers an architecturally simple and empirically effective method to improve the coverage of neural text generation. We run experiments on three directed text generation tasks with different levels of recovering rate, across two modalities, three different neural model architectures and two training strategy variations. The results and analyses demonstrate that our method generalizes well across these settings, produces texts of good quality and outperforms state-of-the-art baselines. In this project, we also explore low resource text generation, specifically, zero-shot and few-shot text summarization. Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a general method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner which makes use of characteristics of the target dataset such as the length and abstractiveness of the desired summaries. We achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate the effectiveness of our approach on three additional, diverse datasets. The models fine-tuned in this unsupervised manner are more robust to noisy data and also achieve better few-shot performance using 10 and 100 training examples. We perform ablation studies on the effect of the components of our unsupervised fine-tuning data and analyze the performance of these models in few-shot scenarios along with data augmentation techniques using both automatic and human evaluation. The work on zero and few-shot text summarization has been accepted by The 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). The investigation of text generation with submodularity will be submitted to The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
author2 Joty Shafiq Rayhan
author_facet Joty Shafiq Rayhan
Han, Simeng
format Final Year Project
author Han, Simeng
author_sort Han, Simeng
title Advancing neural text generation
title_short Advancing neural text generation
title_full Advancing neural text generation
title_fullStr Advancing neural text generation
title_full_unstemmed Advancing neural text generation
title_sort advancing neural text generation
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/147963
_version_ 1698713738409934848