Screening through a broad pool: Towards better diversity for lexically constrained text generation
Lexically constrained text generation (CTG) is to generate text that contains given constrained keywords. However, the text diversity of existing models is still unsatisfactory. In this paper, we propose a lightweight dynamic refinement strategy that aims at increasing the randomness of inference to...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8478 https://ink.library.smu.edu.sg/context/sis_research/article/9481/viewcontent/ScreeningBroadPool_av.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9481 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-94812024-01-04T09:11:31Z Screening through a broad pool: Towards better diversity for lexically constrained text generation YUAN, Changsen HUANG, Heyan CAO, Yixin CAO, Qianwen Lexically constrained text generation (CTG) is to generate text that contains given constrained keywords. However, the text diversity of existing models is still unsatisfactory. In this paper, we propose a lightweight dynamic refinement strategy that aims at increasing the randomness of inference to improve generation richness and diversity while maintaining a high level of fluidity and integrity. Our basic idea is to enlarge the number and length of candidate sentences in each iteration, and choose the best for subsequent refinement. On the one hand, different from previous works, which carefully insert one token between two words per action, we insert an uncertain number of tokens following a well-designed distribution. To ensure high-quality decoding, the insertion number increases as more words are generated. On the other hand, we randomly mask an increasing number of generated words to force Pre-trained Language Models (PLMs) to examine the whole sentence via reconstruction. We have conducted extensive experiments and designed four dimensions for human evaluation. Compared with important baseline (CBART (He, 2021)), our method improves the 1.3% (B-2), 0.1% (B-4), 0.016 (N-2), 0.016 (N-4), 5.7% (M), 1.9% (SB-4), 0.6% (D-2), 0.5% (D-4) on One-Billion-Word dataset (Chelba et al., 2014) and 1.6% (B-2), 0.1% (B-4), 0.121 (N-2), 0.120 (N-4), 0.0% (M), 6.7% (SB-4), 2.7% (D-2), 3.8% (D-4) on Yelp dataset (Cho et al., 2018). The results demonstrate that our method is more diverse and plausible. 2024-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8478 info:doi/10.1016/j.ipm.2023.103602 https://ink.library.smu.edu.sg/context/sis_research/article/9481/viewcontent/ScreeningBroadPool_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Constrained text generation Pre-trained language models Randomly insert Randomly mask Text diversity Databases and Information Systems Theory and Algorithms |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Constrained text generation Pre-trained language models Randomly insert Randomly mask Text diversity Databases and Information Systems Theory and Algorithms |
spellingShingle |
Constrained text generation Pre-trained language models Randomly insert Randomly mask Text diversity Databases and Information Systems Theory and Algorithms YUAN, Changsen HUANG, Heyan CAO, Yixin CAO, Qianwen Screening through a broad pool: Towards better diversity for lexically constrained text generation |
description |
Lexically constrained text generation (CTG) is to generate text that contains given constrained keywords. However, the text diversity of existing models is still unsatisfactory. In this paper, we propose a lightweight dynamic refinement strategy that aims at increasing the randomness of inference to improve generation richness and diversity while maintaining a high level of fluidity and integrity. Our basic idea is to enlarge the number and length of candidate sentences in each iteration, and choose the best for subsequent refinement. On the one hand, different from previous works, which carefully insert one token between two words per action, we insert an uncertain number of tokens following a well-designed distribution. To ensure high-quality decoding, the insertion number increases as more words are generated. On the other hand, we randomly mask an increasing number of generated words to force Pre-trained Language Models (PLMs) to examine the whole sentence via reconstruction. We have conducted extensive experiments and designed four dimensions for human evaluation. Compared with important baseline (CBART (He, 2021)), our method improves the 1.3% (B-2), 0.1% (B-4), 0.016 (N-2), 0.016 (N-4), 5.7% (M), 1.9% (SB-4), 0.6% (D-2), 0.5% (D-4) on One-Billion-Word dataset (Chelba et al., 2014) and 1.6% (B-2), 0.1% (B-4), 0.121 (N-2), 0.120 (N-4), 0.0% (M), 6.7% (SB-4), 2.7% (D-2), 3.8% (D-4) on Yelp dataset (Cho et al., 2018). The results demonstrate that our method is more diverse and plausible. |
format |
text |
author |
YUAN, Changsen HUANG, Heyan CAO, Yixin CAO, Qianwen |
author_facet |
YUAN, Changsen HUANG, Heyan CAO, Yixin CAO, Qianwen |
author_sort |
YUAN, Changsen |
title |
Screening through a broad pool: Towards better diversity for lexically constrained text generation |
title_short |
Screening through a broad pool: Towards better diversity for lexically constrained text generation |
title_full |
Screening through a broad pool: Towards better diversity for lexically constrained text generation |
title_fullStr |
Screening through a broad pool: Towards better diversity for lexically constrained text generation |
title_full_unstemmed |
Screening through a broad pool: Towards better diversity for lexically constrained text generation |
title_sort |
screening through a broad pool: towards better diversity for lexically constrained text generation |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/sis_research/8478 https://ink.library.smu.edu.sg/context/sis_research/article/9481/viewcontent/ScreeningBroadPool_av.pdf |
_version_ |
1787590776975261696 |