Korean jamo-level byte-pair encoding for neural machine translation
Tokenization is the very first step in most Natural Language Processing tasks, and is essential in addressing the fundamental out-of-vocabulary problem, as well as in changing the linguistic understanding. To exploit the characteristics of the Korean language for a more parameter-efficient tokenizat...
Saved in:
Main Author: | Lee, Junyoung |
---|---|
Other Authors: | Wang Lipo |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172737 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Machine learning for new friends recommendation in NTU
by: Niu, Jianan
Published: (2021) -
Building generalizable models for discourse phenomena evaluation and machine translation
by: Jwalapuram, Prathyusha
Published: (2023) -
Deep metric based feature engineering to Improve document-level representation for document clustering
by: Xu, Liwen
Published: (2022) -
Natural language translation with graph convolutional neural network
by: Zhu, Yimin
Published: (2018) -
Comparison of character recognition performance - bayes classifier and neural network methods
by: Low, Siew Eng.
Published: (2008)