Cross-lingual transfer learning for statistical type inference

Hitherto statistical type inference systems rely thoroughly on supervised learning approaches, which require laborious manual effort to collect and label large amounts of data. Most Turing-complete imperative languages share similar control- and data-flow structures, which make it possible to transf...

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Zhiming, XIE, Xiaofei, LI, Haoliang, XU, Zhengzi, LI, Yi, LIU, Yang
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7194
https://ink.library.smu.edu.sg/context/sis_research/article/8197/viewcontent/Li2022CLT.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8197
record_format dspace
spelling sg-smu-ink.sis_research-81972022-08-04T08:56:24Z Cross-lingual transfer learning for statistical type inference LI, Zhiming XIE, Xiaofei LI, Haoliang XU, Zhengzi LI, Yi LIU, Yang Hitherto statistical type inference systems rely thoroughly on supervised learning approaches, which require laborious manual effort to collect and label large amounts of data. Most Turing-complete imperative languages share similar control- and data-flow structures, which make it possible to transfer knowledge learned from one language to another. In this paper, we propose a cross-lingual transfer learning framework, Plato, for statistical type inference, which allows us to leverage prior knowledge learned from the labeled dataset of one language and transfer it to the others, e.g., Python to JavaScript, Java to JavaScript, etc. Plato is powered by a novel kernelized attention mechanism to constrain the attention scope of the backbone Transformer model such that model is forced to base its prediction on commonly shared features among languages. In addition, we propose the syntax enhancement that augments the learning on the feature overlap among language domains. Furthermore, Plato can also be used to improve the performance of the conventional supervised-based type inference by introducing crosslanguage augmentation, which enables the model to learn more general features across multiple languages. We evaluated Plato under two settings: 1) under the cross-domain scenario that the target language data is not labeled or labeled partially, the results show that Plato outperforms the state-of-the-art domain transfer techniques by a large margin, e.g., it improves the Python to TypeScript baseline by +14.6%@EM, +18.6%@weighted-F1, and 2) under the conventional monolingual supervised scenario, Plato improves the Python baseline by +4.10%@EM, +1.90%@weighted-F1 with the introduction of the cross-lingual augmentation. 2022-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7194 info:doi/10.1145/3533767.3534411 https://ink.library.smu.edu.sg/context/sis_research/article/8197/viewcontent/Li2022CLT.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep Learning Transfer Learning Type Inference Databases and Information Systems Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Deep Learning
Transfer Learning
Type Inference
Databases and Information Systems
Software Engineering
spellingShingle Deep Learning
Transfer Learning
Type Inference
Databases and Information Systems
Software Engineering
LI, Zhiming
XIE, Xiaofei
LI, Haoliang
XU, Zhengzi
LI, Yi
LIU, Yang
Cross-lingual transfer learning for statistical type inference
description Hitherto statistical type inference systems rely thoroughly on supervised learning approaches, which require laborious manual effort to collect and label large amounts of data. Most Turing-complete imperative languages share similar control- and data-flow structures, which make it possible to transfer knowledge learned from one language to another. In this paper, we propose a cross-lingual transfer learning framework, Plato, for statistical type inference, which allows us to leverage prior knowledge learned from the labeled dataset of one language and transfer it to the others, e.g., Python to JavaScript, Java to JavaScript, etc. Plato is powered by a novel kernelized attention mechanism to constrain the attention scope of the backbone Transformer model such that model is forced to base its prediction on commonly shared features among languages. In addition, we propose the syntax enhancement that augments the learning on the feature overlap among language domains. Furthermore, Plato can also be used to improve the performance of the conventional supervised-based type inference by introducing crosslanguage augmentation, which enables the model to learn more general features across multiple languages. We evaluated Plato under two settings: 1) under the cross-domain scenario that the target language data is not labeled or labeled partially, the results show that Plato outperforms the state-of-the-art domain transfer techniques by a large margin, e.g., it improves the Python to TypeScript baseline by +14.6%@EM, +18.6%@weighted-F1, and 2) under the conventional monolingual supervised scenario, Plato improves the Python baseline by +4.10%@EM, +1.90%@weighted-F1 with the introduction of the cross-lingual augmentation.
format text
author LI, Zhiming
XIE, Xiaofei
LI, Haoliang
XU, Zhengzi
LI, Yi
LIU, Yang
author_facet LI, Zhiming
XIE, Xiaofei
LI, Haoliang
XU, Zhengzi
LI, Yi
LIU, Yang
author_sort LI, Zhiming
title Cross-lingual transfer learning for statistical type inference
title_short Cross-lingual transfer learning for statistical type inference
title_full Cross-lingual transfer learning for statistical type inference
title_fullStr Cross-lingual transfer learning for statistical type inference
title_full_unstemmed Cross-lingual transfer learning for statistical type inference
title_sort cross-lingual transfer learning for statistical type inference
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7194
https://ink.library.smu.edu.sg/context/sis_research/article/8197/viewcontent/Li2022CLT.pdf
_version_ 1770576267582636032