Morphologically-aware vocabulary reduction of word embeddings
We propose SubText, a compression mechanism via vocabulary reduction. The crux is to judiciously select a subset of word embeddings which support the reconstruction of the remaining word embeddings based on their form alone. The proposed algorithm considers the preservation of the original embedding...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2023
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7608 https://ink.library.smu.edu.sg/context/sis_research/article/8611/viewcontent/wiiat22.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8611 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-86112023-08-04T03:29:43Z Morphologically-aware vocabulary reduction of word embeddings CHIA, Chong Cher TKACHENKO, Maksim LAUW, Hady Wirawan We propose SubText, a compression mechanism via vocabulary reduction. The crux is to judiciously select a subset of word embeddings which support the reconstruction of the remaining word embeddings based on their form alone. The proposed algorithm considers the preservation of the original embeddings, as well as a word’s relationship to other words that are morphologically or semantically similar. Comprehensive evaluation of the compressed vocabulary reveals SubText’s efficacy on diverse tasks over traditional vocabulary reduction techniques, as validated on English, as well as a collection of inflected languages. 2023-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7608 info:doi/10.1109/WI-IAT55865.2022.00018 https://ink.library.smu.edu.sg/context/sis_research/article/8611/viewcontent/wiiat22.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Word embeddings compression vocabulary reduction Databases and Information Systems Numerical Analysis and Scientific Computing Theory and Algorithms |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Word embeddings compression vocabulary reduction Databases and Information Systems Numerical Analysis and Scientific Computing Theory and Algorithms |
spellingShingle |
Word embeddings compression vocabulary reduction Databases and Information Systems Numerical Analysis and Scientific Computing Theory and Algorithms CHIA, Chong Cher TKACHENKO, Maksim LAUW, Hady Wirawan Morphologically-aware vocabulary reduction of word embeddings |
description |
We propose SubText, a compression mechanism via vocabulary reduction. The crux is to judiciously select a subset of word embeddings which support the reconstruction of the remaining word embeddings based on their form alone. The proposed algorithm considers the preservation of the original embeddings, as well as a word’s relationship to other words that are morphologically or semantically similar. Comprehensive evaluation of the compressed vocabulary reveals SubText’s efficacy on diverse tasks over traditional vocabulary reduction techniques, as validated on English, as well as a collection of inflected languages. |
format |
text |
author |
CHIA, Chong Cher TKACHENKO, Maksim LAUW, Hady Wirawan |
author_facet |
CHIA, Chong Cher TKACHENKO, Maksim LAUW, Hady Wirawan |
author_sort |
CHIA, Chong Cher |
title |
Morphologically-aware vocabulary reduction of word embeddings |
title_short |
Morphologically-aware vocabulary reduction of word embeddings |
title_full |
Morphologically-aware vocabulary reduction of word embeddings |
title_fullStr |
Morphologically-aware vocabulary reduction of word embeddings |
title_full_unstemmed |
Morphologically-aware vocabulary reduction of word embeddings |
title_sort |
morphologically-aware vocabulary reduction of word embeddings |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2023 |
url |
https://ink.library.smu.edu.sg/sis_research/7608 https://ink.library.smu.edu.sg/context/sis_research/article/8611/viewcontent/wiiat22.pdf |
_version_ |
1773551435242274816 |