AI coders are among us : Rethinking programming language grammar towards efficient code generation

Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks...

Full description

Saved in:

Bibliographic Details
Main Authors:	SUN Zhensu, DU Xiaoning, YANG Zhou, LI Li, LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Code generation Programming language Large Language Model Grammars and context-free language Philosophical/theoretical foundations of artificial intelligence AI-oriented grammar for Python Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9886 https://ink.library.smu.edu.sg/context/sis_research/article/10886/viewcontent/3650212.3680347.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10886
record_format	dspace
spelling	sg-smu-ink.sis_research-108862025-01-02T09:10:15Z AI coders are among us : Rethinking programming language grammar towards efficient code generation SUN Zhensu, DU Xiaoning, YANG Zhou, LI Li, LO, David Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks, including mathematical problems. However, the grammar and layout of current programs are designed to cater the needs of human developers -- with many grammar tokens and formatting tokens being used to make the code easier for humans to read. While this is helpful, such a design adds unnecessary computational work for LLMs, as each token they either use or produce consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar.This aims to represent code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens to convey code semantics effectively. To demonstrate the feasibility of this concept, we explore and implement the first AI-oriented grammar for Python, named Simple Python (SimPy). SimPy is crafted by revising the original Python grammar through a series of heuristic rules. Programs written in SimPy maintain identical Abstract Syntax Tree (AST) structures to those in standard Python. This allows for not only execution via a modified AST parser, but also seamless transformation between programs written in Python and SimPy, enabling human developers and LLMs to use Python and SimPy, respectively, when they need to collaborate. We also look into methods to help existing LLMs understand and use SimPy effectively. In the experiments, compared with Python, SimPy enables a reduction in token usage by 13.5% and 10.4% for CodeLlama and GPT-4, respectively, when completing the same set of code-related tasks. Additionally, these models can maintain or even improve their performance when using SimPy instead of Python for these tasks. With these promising results, we call for further contributions to the development of AI-oriented program grammar within our community. 2024-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9886 info:doi/10.1145/3650212.3680347 https://ink.library.smu.edu.sg/context/sis_research/article/10886/viewcontent/3650212.3680347.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Code generation Programming language Large Language Model Grammars and context-free language Philosophical/theoretical foundations of artificial intelligence AI-oriented grammar for Python Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Code generation Programming language Large Language Model Grammars and context-free language Philosophical/theoretical foundations of artificial intelligence AI-oriented grammar for Python Software Engineering
spellingShingle	Code generation Programming language Large Language Model Grammars and context-free language Philosophical/theoretical foundations of artificial intelligence AI-oriented grammar for Python Software Engineering SUN Zhensu, DU Xiaoning, YANG Zhou, LI Li, LO, David AI coders are among us : Rethinking programming language grammar towards efficient code generation
description	Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks, including mathematical problems. However, the grammar and layout of current programs are designed to cater the needs of human developers -- with many grammar tokens and formatting tokens being used to make the code easier for humans to read. While this is helpful, such a design adds unnecessary computational work for LLMs, as each token they either use or produce consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar.This aims to represent code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens to convey code semantics effectively. To demonstrate the feasibility of this concept, we explore and implement the first AI-oriented grammar for Python, named Simple Python (SimPy). SimPy is crafted by revising the original Python grammar through a series of heuristic rules. Programs written in SimPy maintain identical Abstract Syntax Tree (AST) structures to those in standard Python. This allows for not only execution via a modified AST parser, but also seamless transformation between programs written in Python and SimPy, enabling human developers and LLMs to use Python and SimPy, respectively, when they need to collaborate. We also look into methods to help existing LLMs understand and use SimPy effectively. In the experiments, compared with Python, SimPy enables a reduction in token usage by 13.5% and 10.4% for CodeLlama and GPT-4, respectively, when completing the same set of code-related tasks. Additionally, these models can maintain or even improve their performance when using SimPy instead of Python for these tasks. With these promising results, we call for further contributions to the development of AI-oriented program grammar within our community.
format	text
author	SUN Zhensu, DU Xiaoning, YANG Zhou, LI Li, LO, David
author_facet	SUN Zhensu, DU Xiaoning, YANG Zhou, LI Li, LO, David
author_sort	SUN Zhensu,
title	AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_short	AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_full	AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_fullStr	AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_full_unstemmed	AI coders are among us : Rethinking programming language grammar towards efficient code generation
title_sort	ai coders are among us : rethinking programming language grammar towards efficient code generation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9886 https://ink.library.smu.edu.sg/context/sis_research/article/10886/viewcontent/3650212.3680347.pdf
_version_	1821237274359103488

AI coders are among us : Rethinking programming language grammar towards efficient code generation

Similar Items