Interpretable vector language models
Natural Language Processing (NLP) is an important part of Artificial Intelligence (AI) that aims to create algorithms which improve how humans understand and interpret bodies of text. In particular, word embeddings form a vital part of NLP, as models like Word2Vec and GloVe assign numeric vectors to...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175573 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Natural Language Processing (NLP) is an important part of Artificial Intelligence (AI) that aims to create algorithms which improve how humans understand and interpret bodies of text. In particular, word embeddings form a vital part of NLP, as models like Word2Vec and GloVe assign numeric vectors to words in a text corpus such that norms and angles between words are preserved and semantic structure is maintained. While their effectiveness is undisputed, they face a major limitation in the form of limited interpretability, as individual entries are hard to interpret due to the simultaneous rotation of all vectors preserving semantic structure while entries become mixed up. Hence, in this study, we proposed a novel approach of generating word embeddings with a higher degree of interpretability. We associated the interpretability of a word embedding with the optimisation of various loss functions, namely Varimax, Quartimax and the l1-norm, defined on the Lie group SO(d). Our findings revealed that the l1-norm method achieved the highest level of interpretability among the three methods, because its solutions tend to have a higher proportion of matrix elements that are close to zero by promoting sparsity. Through this study, we hope to have provided valuable insights into creating word embeddings with more interpretable entries. |
---|