CROSS-LINGUAL WORD EMBEDDING-BASED TRANSFER LEARNING FOR EXTRACTIVE TEXT SUMMARIZATION

Transfer learning is a learning concept by making use of the knowledge gained from solving one problem and applying it to different, but related problem. Currently, it has gained increasing attention due to the good performance when given insufficient training data. In the text processing area, t...

Full description

Saved in:
Bibliographic Details
Main Author: Wijayanti, Rini
Format: Dissertations
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76687
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:76687
spelling id-itb.:766872023-08-17T23:40:17ZCROSS-LINGUAL WORD EMBEDDING-BASED TRANSFER LEARNING FOR EXTRACTIVE TEXT SUMMARIZATION Wijayanti, Rini Indonesia Dissertations cross-lingual, transfer learning, cross-lingual word embedding text summarization, low-resource language INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76687 Transfer learning is a learning concept by making use of the knowledge gained from solving one problem and applying it to different, but related problem. Currently, it has gained increasing attention due to the good performance when given insufficient training data. In the text processing area, this learning’s capability is beneficial to solve the problem of low-resource languages such as Indonesian, which is suffered from labeled data and other natural language processing tools. Even though unlabeled data is abundant and can be obtained freely from the internet, the annotation process is very costly and time-consuming. Given a gap in linguistic resources between languages, a cross-lingual transfer learning approach that leverages knowledge obtained from available resources in the source language (typically English) can be a solution. The process of transferring knowledge across languages is carried out through cross-lingual word embedding (CLWE), which is analogous to representing a dictionary.The static CLWE generated by mapping method is approriate for low-resource languages since it does not require parallel corpus that is difficult to obtain, and it does not require high computational resources. However, the unsupervised initialization approach remains a challenge in this method because it affects the mapping results between the two languages. As a result, it is suggested that a shared vocabulary space be used in the initialization process to ensure that identical terms in both language corpora have the same embedding. The language mapping method will only be used on terms with no shared information. It is also suggested to develop a contextual CLWE based on the BERT multilingual pre- training technique. Although this model has been widely used in cross-lingual situations, the training does not explicitly include an alignment phase. The quality of CLWE was evaluated both intrinsically and extrinsically, by performing Bilingual Lexicon Induction and applying it to a cross-lingual transfer learning-based text summarization task. The transfer model technique used is feature extraction since it can reduce computing time. The experimental results indicate that improving the initialization step enhances CLWE performance to the vi level of supervised approach. The implementation of static CLWE in a cross- lingual text summarization architecture yields a higher ROUGE value than the monolingual case. However, the use of contextual CLWE did not result in a significant increase, yet it can improve multilingual Bert performance. This study is expected to help address the research gap in natural language processing between high and low-resource languages.. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Transfer learning is a learning concept by making use of the knowledge gained from solving one problem and applying it to different, but related problem. Currently, it has gained increasing attention due to the good performance when given insufficient training data. In the text processing area, this learning’s capability is beneficial to solve the problem of low-resource languages such as Indonesian, which is suffered from labeled data and other natural language processing tools. Even though unlabeled data is abundant and can be obtained freely from the internet, the annotation process is very costly and time-consuming. Given a gap in linguistic resources between languages, a cross-lingual transfer learning approach that leverages knowledge obtained from available resources in the source language (typically English) can be a solution. The process of transferring knowledge across languages is carried out through cross-lingual word embedding (CLWE), which is analogous to representing a dictionary.The static CLWE generated by mapping method is approriate for low-resource languages since it does not require parallel corpus that is difficult to obtain, and it does not require high computational resources. However, the unsupervised initialization approach remains a challenge in this method because it affects the mapping results between the two languages. As a result, it is suggested that a shared vocabulary space be used in the initialization process to ensure that identical terms in both language corpora have the same embedding. The language mapping method will only be used on terms with no shared information. It is also suggested to develop a contextual CLWE based on the BERT multilingual pre- training technique. Although this model has been widely used in cross-lingual situations, the training does not explicitly include an alignment phase. The quality of CLWE was evaluated both intrinsically and extrinsically, by performing Bilingual Lexicon Induction and applying it to a cross-lingual transfer learning-based text summarization task. The transfer model technique used is feature extraction since it can reduce computing time. The experimental results indicate that improving the initialization step enhances CLWE performance to the vi level of supervised approach. The implementation of static CLWE in a cross- lingual text summarization architecture yields a higher ROUGE value than the monolingual case. However, the use of contextual CLWE did not result in a significant increase, yet it can improve multilingual Bert performance. This study is expected to help address the research gap in natural language processing between high and low-resource languages..
format Dissertations
author Wijayanti, Rini
spellingShingle Wijayanti, Rini
CROSS-LINGUAL WORD EMBEDDING-BASED TRANSFER LEARNING FOR EXTRACTIVE TEXT SUMMARIZATION
author_facet Wijayanti, Rini
author_sort Wijayanti, Rini
title CROSS-LINGUAL WORD EMBEDDING-BASED TRANSFER LEARNING FOR EXTRACTIVE TEXT SUMMARIZATION
title_short CROSS-LINGUAL WORD EMBEDDING-BASED TRANSFER LEARNING FOR EXTRACTIVE TEXT SUMMARIZATION
title_full CROSS-LINGUAL WORD EMBEDDING-BASED TRANSFER LEARNING FOR EXTRACTIVE TEXT SUMMARIZATION
title_fullStr CROSS-LINGUAL WORD EMBEDDING-BASED TRANSFER LEARNING FOR EXTRACTIVE TEXT SUMMARIZATION
title_full_unstemmed CROSS-LINGUAL WORD EMBEDDING-BASED TRANSFER LEARNING FOR EXTRACTIVE TEXT SUMMARIZATION
title_sort cross-lingual word embedding-based transfer learning for extractive text summarization
url https://digilib.itb.ac.id/gdl/view/76687
_version_ 1822008053798535168