Korean jamo-level byte-pair encoding for neural machine translation

Tokenization is the very first step in most Natural Language Processing tasks, and is essential in addressing the fundamental out-of-vocabulary problem, as well as in changing the linguistic understanding. To exploit the characteristics of the Korean language for a more parameter-efficient tokenizat...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Junyoung
Other Authors: Wang Lipo
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172737
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English