PSDVec: A toolbox for incremental and scalable word embedding

PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted low-rank positive semidefinite...

Full description

Saved in:
Bibliographic Details
Main Authors: Li, Shaohua, Zhu, Jun, Miao, Chunyan
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2017
Subjects:
Online Access:https://hdl.handle.net/10356/83166
http://hdl.handle.net/10220/42454
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-83166
record_format dspace
spelling sg-ntu-dr.10356-831662020-03-07T11:48:55Z PSDVec: A toolbox for incremental and scalable word embedding Li, Shaohua Zhu, Jun Miao, Chunyan School of Computer Science and Engineering NTU-UBC Research Centre of Excellence in Active Living for the Elderly Word embedding Matrix factorization PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted low-rank positive semidefinite approximation. To scale up the learning process, we implement a blockwise online learning algorithm to learn the embeddings incrementally. This strategy greatly reduces the learning time of word embeddings on a large vocabulary, and can learn the embeddings of new words without re-learning the whole vocabulary. On 9 word similarity/analogy benchmark sets and 2 Natural Language Processing (NLP) tasks, PSDVec produces embeddings that has the best average performance among popular word embedding tools. PSDVec provides a new option for NLP practitioners. Accepted version 2017-05-19T02:47:30Z 2019-12-06T15:13:08Z 2017-05-19T02:47:30Z 2019-12-06T15:13:08Z 2017 2016 Journal Article Li, S., Zhu, J., & Miao, C. (2016). PSDVec: A toolbox for incremental and scalable word embedding. Neurocomputing, 237, 405-409. 0925-2312 https://hdl.handle.net/10356/83166 http://hdl.handle.net/10220/42454 10.1016/j.neucom.2016.05.093 200777 en Neurocomputing © 2016 Elsevier B. V. This is the author created version of a work that has been peer reviewed and accepted for publication by Neurocomputing, Elsevier. It incorporates referee’s comments but changes resulting from the publishing process, such as copyediting, structural formatting, may not be reflected in this document. The published version is available at: [https://doi.org/10.1016/j.neucom.2016.05.093]. 12 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Word embedding
Matrix factorization
spellingShingle Word embedding
Matrix factorization
Li, Shaohua
Zhu, Jun
Miao, Chunyan
PSDVec: A toolbox for incremental and scalable word embedding
description PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted low-rank positive semidefinite approximation. To scale up the learning process, we implement a blockwise online learning algorithm to learn the embeddings incrementally. This strategy greatly reduces the learning time of word embeddings on a large vocabulary, and can learn the embeddings of new words without re-learning the whole vocabulary. On 9 word similarity/analogy benchmark sets and 2 Natural Language Processing (NLP) tasks, PSDVec produces embeddings that has the best average performance among popular word embedding tools. PSDVec provides a new option for NLP practitioners.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Li, Shaohua
Zhu, Jun
Miao, Chunyan
format Article
author Li, Shaohua
Zhu, Jun
Miao, Chunyan
author_sort Li, Shaohua
title PSDVec: A toolbox for incremental and scalable word embedding
title_short PSDVec: A toolbox for incremental and scalable word embedding
title_full PSDVec: A toolbox for incremental and scalable word embedding
title_fullStr PSDVec: A toolbox for incremental and scalable word embedding
title_full_unstemmed PSDVec: A toolbox for incremental and scalable word embedding
title_sort psdvec: a toolbox for incremental and scalable word embedding
publishDate 2017
url https://hdl.handle.net/10356/83166
http://hdl.handle.net/10220/42454
_version_ 1681034829204815872