AI coders are among us : Rethinking programming language grammar towards efficient code generation

Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks...

Full description

Saved in:
Bibliographic Details
Main Authors: SUN Zhensu, DU Xiaoning, YANG Zhou, LI Li, LO, David
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9886
https://ink.library.smu.edu.sg/context/sis_research/article/10886/viewcontent/3650212.3680347.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10886
record_format dspace
spelling sg-smu-ink.sis_research-108862025-01-02T09:10:15Z AI coders are among us : Rethinking programming language grammar towards efficient code generation SUN Zhensu, DU Xiaoning, YANG Zhou, LI Li, LO, David Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks, including mathematical problems. However, the grammar and layout of current programs are designed to cater the needs of human developers -- with many grammar tokens and formatting tokens being used to make the code easier for humans to read. While this is helpful, such a design adds unnecessary computational work for LLMs, as each token they either use or produce consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar.This aims to represent code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens to convey code semantics effectively. To demonstrate the feasibility of this concept, we explore and implement the first AI-oriented grammar for Python, named Simple Python (SimPy). SimPy is crafted by revising the original Python grammar through a series of heuristic rules. Programs written in SimPy maintain identical Abstract Syntax Tree (AST) structures to those in standard Python. This allows for not only execution via a modified AST parser, but also seamless transformation between programs written in Python and SimPy, enabling human developers and LLMs to use Python and SimPy, respectively, when they need to collaborate. We also look into methods to help existing LLMs understand and use SimPy effectively. In the experiments, compared with Python, SimPy enables a reduction in token usage by 13.5% and 10.4% for CodeLlama and GPT-4, respectively, when completing the same set of code-related tasks. Additionally, these models can maintain or even improve their performance when using SimPy instead of Python for these tasks. With these promising results, we call for further contributions to the development of AI-oriented program grammar within our community. 2024-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9886 info:doi/10.1145/3650212.3680347 https://ink.library.smu.edu.sg/context/sis_research/article/10886/viewcontent/3650212.3680347.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Code generation Programming language Large Language Model Grammars and context-free language Philosophical/theoretical foundations of artificial intelligence AI-oriented grammar for Python Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Code generation
Programming language
Large Language Model
Grammars and context-free language
Philosophical/theoretical foundations of artificial intelligence
AI-oriented grammar for Python
Software Engineering
spellingShingle Code generation
Programming language
Large Language Model
Grammars and context-free language
Philosophical/theoretical foundations of artificial intelligence
AI-oriented grammar for Python
Software Engineering
SUN Zhensu,
DU Xiaoning,
YANG Zhou,
LI Li,
LO, David
AI coders are among us : Rethinking programming language grammar towards efficient code generation
description Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks, including mathematical problems. However, the grammar and layout of current programs are designed to cater the needs of human developers -- with many grammar tokens and formatting tokens being used to make the code easier for humans to read. While this is helpful, such a design adds unnecessary computational work for LLMs, as each token they either use or produce consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar.This aims to represent code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens to convey code semantics effectively. To demonstrate the feasibility of this concept, we explore and implement the first AI-oriented grammar for Python, named Simple Python (SimPy). SimPy is crafted by revising the original Python grammar through a series of heuristic rules. Programs written in SimPy maintain identical Abstract Syntax Tree (AST) structures to those in standard Python. This allows for not only execution via a modified AST parser, but also seamless transformation between programs written in Python and SimPy, enabling human developers and LLMs to use Python and SimPy, respectively, when they need to collaborate. We also look into methods to help existing LLMs understand and use SimPy effectively. In the experiments, compared with Python, SimPy enables a reduction in token usage by 13.5% and 10.4% for CodeLlama and GPT-4, respectively, when completing the same set of code-related tasks. Additionally, these models can maintain or even improve their performance when using SimPy instead of Python for these tasks. With these promising results, we call for further contributions to the development of AI-oriented program grammar within our community.
format text
author SUN Zhensu,
DU Xiaoning,
YANG Zhou,
LI Li,
LO, David
author_facet SUN Zhensu,
DU Xiaoning,
YANG Zhou,
LI Li,
LO, David
author_sort SUN Zhensu,
title AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_short AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_full AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_fullStr AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_full_unstemmed AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_sort ai coders are among us : rethinking programming language grammar towards efficient code generation
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9886
https://ink.library.smu.edu.sg/context/sis_research/article/10886/viewcontent/3650212.3680347.pdf
_version_ 1821237274359103488