Morphologically-aware vocabulary reduction of word embeddings

We propose SubText, a compression mechanism via vocabulary reduction. The crux is to judiciously select a subset of word embeddings which support the reconstruction of the remaining word embeddings based on their form alone. The proposed algorithm considers the preservation of the original embedding...

Full description

Saved in:
Bibliographic Details
Main Authors: CHIA, Chong Cher, TKACHENKO, Maksim, LAUW, Hady Wirawan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7608
https://ink.library.smu.edu.sg/context/sis_research/article/8611/viewcontent/wiiat22.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8611
record_format dspace
spelling sg-smu-ink.sis_research-86112023-08-04T03:29:43Z Morphologically-aware vocabulary reduction of word embeddings CHIA, Chong Cher TKACHENKO, Maksim LAUW, Hady Wirawan We propose SubText, a compression mechanism via vocabulary reduction. The crux is to judiciously select a subset of word embeddings which support the reconstruction of the remaining word embeddings based on their form alone. The proposed algorithm considers the preservation of the original embeddings, as well as a word’s relationship to other words that are morphologically or semantically similar. Comprehensive evaluation of the compressed vocabulary reveals SubText’s efficacy on diverse tasks over traditional vocabulary reduction techniques, as validated on English, as well as a collection of inflected languages. 2023-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7608 info:doi/10.1109/WI-IAT55865.2022.00018 https://ink.library.smu.edu.sg/context/sis_research/article/8611/viewcontent/wiiat22.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Word embeddings compression vocabulary reduction Databases and Information Systems Numerical Analysis and Scientific Computing Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Word embeddings
compression
vocabulary reduction
Databases and Information Systems
Numerical Analysis and Scientific Computing
Theory and Algorithms
spellingShingle Word embeddings
compression
vocabulary reduction
Databases and Information Systems
Numerical Analysis and Scientific Computing
Theory and Algorithms
CHIA, Chong Cher
TKACHENKO, Maksim
LAUW, Hady Wirawan
Morphologically-aware vocabulary reduction of word embeddings
description We propose SubText, a compression mechanism via vocabulary reduction. The crux is to judiciously select a subset of word embeddings which support the reconstruction of the remaining word embeddings based on their form alone. The proposed algorithm considers the preservation of the original embeddings, as well as a word’s relationship to other words that are morphologically or semantically similar. Comprehensive evaluation of the compressed vocabulary reveals SubText’s efficacy on diverse tasks over traditional vocabulary reduction techniques, as validated on English, as well as a collection of inflected languages.
format text
author CHIA, Chong Cher
TKACHENKO, Maksim
LAUW, Hady Wirawan
author_facet CHIA, Chong Cher
TKACHENKO, Maksim
LAUW, Hady Wirawan
author_sort CHIA, Chong Cher
title Morphologically-aware vocabulary reduction of word embeddings
title_short Morphologically-aware vocabulary reduction of word embeddings
title_full Morphologically-aware vocabulary reduction of word embeddings
title_fullStr Morphologically-aware vocabulary reduction of word embeddings
title_full_unstemmed Morphologically-aware vocabulary reduction of word embeddings
title_sort morphologically-aware vocabulary reduction of word embeddings
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/7608
https://ink.library.smu.edu.sg/context/sis_research/article/8611/viewcontent/wiiat22.pdf
_version_ 1773551435242274816