Effective and efficient semantic representations and their applications

The proliferation of affordable and compact digital storage has also led to the creation of enormous databases of information, and much attention has been focused on the problem of processing unorganized and unstructured information into some form from which additional value can be extracted. Contem...

Full description

Saved in:
Bibliographic Details
Main Author: CHIA, Chong Cher
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/536
https://ink.library.smu.edu.sg/context/etd_coll/article/1534/viewcontent/GPIS_AY2018_PhD_Chia_Chong_Cher.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:The proliferation of affordable and compact digital storage has also led to the creation of enormous databases of information, and much attention has been focused on the problem of processing unorganized and unstructured information into some form from which additional value can be extracted. Contemporary approaches to this problem virtually necessitate the use of complex models running on computational systems due to the sheer volume of information to be processed. While it is possible for the model to be fed the actual data as input, typically a representation of the data is used instead. These representations are therefore of interest, as they act as intermediaries through which the database information are processed and therefore impact the resulting performance of the trained model. This dissertation is split into two parts: we first discuss in detail the effectiveness and efficiency of semantic data representations: Effective semantic representations focus on aspects generally related to the capabilities of these representations, such as task performance and interpretability. Efficient semantic representations encompass aspects which generally relate to the utilization of these representations, such as their storage size as well as generalizability across multiple tasks. Next, we explore an application of semantic representations in downstream tasks, before elaborating on multiple directions relating to such applications for future work. We present two works for discussion in the first part of the dissertation, where each work is focused on a specific form of semantic data. For textual data representations, we introduce a novel approach that improves efficiency through discarding representations, while limiting the impacts on downstream task effectiveness. For knowledge base representations, we explore a novel measure of node importance in knowledge graphs, and present a heuristic approach for selecting such nodes in large knowledge graphs. In the second part of the dissertation, we discuss the application of semantic representations in two downstream Natural Language Processing (NLP) tasks. We first describe the use of semantic representations generated by Large Language Models (LLMs) in an Information Retrieval (IR) system, and overcome the "cold-start" problem in the Legal NLP domain by introducing a novel heuristic for labelling "key" legal passages. We then propose a future research direction for generating summaries from long legal documents, which raises research questions regarding the input representation of such documents as well as the evaluation of such summarization models.