Unveiling memorization in code models

Unveiling memorization in code models

The availability of large-scale datasets, advanced architectures, and powerful computational resources have led to effective code models that automate diverse software engineering activities. The datasets usually consist of billions of lines of code from both open-source and private repositories. A...

Full description

Saved in:

Bibliographic Details
Main Authors:	YANG, Zhou, ZHAO, Zhipeng, WANG, Chenyu, SHI, Jieke, KIM, Dongsun, HAN, DongGyun, LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Open-Source Software Memorization Code Generation Programming Languages and Compilers Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9246 https://ink.library.smu.edu.sg/context/sis_research/article/10246/viewcontent/3597503.3639074.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Promise and peril of collaborative code generation models : Balancing effectiveness and memorization
by: CHEN, Zhi, et al.
Published: (2024)

Unveiling code pre-trained models: Investigating syntax and semantics capacities
by: MA, Wei, et al.
Published: (2024)

Code coverage and postrelease defects: A large-scale study on open source projects
by: KOCHHAR, Pavneet Singh, et al.
Published: (2017)

Code search is all you need? Improving code suggestions with code search
by: CHEN, Junkai, et al.
Published: (2024)

Multi-modal API recommendation
by: IRSAN, Ivana Clairine, et al.
Published: (2023)

Automatic code review by learning the revision of source code
by: SHI, Shu-Ting, et al.
Published: (2019)

GraphCode2Vec: Generic code embedding via lexical and program dependence analyses
by: MA, Wei, et al.
Published: (2022)

Greening large language models of code
by: SHI, Jieke, et al.
Published: (2024)

SOTorrent: Studying the origin, evolution, and usage of stack overflow code snippets
by: BALTES, Sebastian, et al.
Published: (2019)

Google summer of code: Student motivations and contributions
by: SILVA, Jefferson O., et al.
Published: (2020)

Deep code comment generation with hybrid lexical and syntactical information
by: HU, Xing, et al.
Published: (2019)

BinAlign: Alignment Padding Based Compiler Provenance Recovery
by: MALIHA ISMAIL,, et al.
Published: (2023)

CodeMatcher: Searching code based on sequential semantics of important query words
by: LIU, Chao, et al.
Published: (2022)

Boosting just-in-time defect prediction with specific features of C/C++ programming languages in code changes
by: NI, Chao, et al.
Published: (2023)

ENHANCING PRIVACY IN MACHINE LEARNING THROUGH THE MINIMIZATION OF MEMORIZATION
by: ZHENG ESTELLE
Published: (2024)

Assessing generalizability of CodeBERT
by: ZHOU, Xin, et al.
Published: (2021)

Mechanism of stress memorization technique (SMT) and method to maximize its effect
by: Pandey, S.M., et al.
Published: (2014)

Learning program semantics with code representations: An empirical study
by: SIOW, Jing Kai, et al.
Published: (2022)

Encoding version history context for better code representation
by: NGUYEN, Huy, et al.
Published: (2024)

A Comparison of the Effectiveness of Four Note-Taking Methods in Memorization and Comprehension of Grades 9 to 12 Students in an Online Setup
by: Benavidez, Giann M., et al.
Published: (2022)

Can identifier splitting improve open-vocabulary language model of code?
by: SHI, Jieke, et al.
Published: (2022)

Retrieval-augmented generation for code summarization via hybrid GNN
by: LIU, Shangqing, et al.
Published: (2021)

Fixing your own smells: Adding a mistake-based familiarization step when teaching code refactoring
by: TAN, Ivan Wei Han, et al.
Published: (2024)

Mining implicit design templates for actionable code reuse
by: LIN, Yun, et al.
Published: (2017)

On locating malicious code in piggybacked Android apps
by: LI, Li, et al.
Published: (2017)

Novel deep learning methods combined with static analysis for source code processing
by: BUI, Duy Quoc Nghi
Published: (2020)

Understanding bug fixes in Ant: An observational study
by: SAHA, Shilpi, et al.
Published: (2014)

Mutation analysis for evaluating code translation
by: GUIZZO, Giovani, et al.
Published: (2024)

Gotcha ! This model uses my code ! Evaluating membership leakage risks in code models
by: YANG, Zhou, et al.
Published: (2024)

Assessing AI detectors in identifying AI-generated code: Implications for education
by: PAN, Wei Hung, et al.
Published: (2024)

CodeMatcher: A tool for large-scale code search based on query semantics matching
by: LIU, Chao, et al.
Published: (2022)

CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E)
by: LV, Fei, et al.
Published: (2015)

KAPE: kNN-based performance testing for deep code search
by: GUO, Yuejun, et al.
Published: (2023)

Large language model for vulnerability detection: Emerging results and future directions
by: ZHOU, Xin, et al.
Published: (2024)

On the feasibility of detecting cross-platform code clones via identifier similarity
by: CHENG, Xiao, et al.
Published: (2016)

Towards expressive specification and efficient model checking
by: DONG, Jin Song, et al.
Published: (2009)

Augmenting and structuring user queries to support efficient free-form code search
by: SIRRES, Raphael, et al.
Published: (2018)

Generation-based code review automation: How far are we?
by: ZHOU, Xin, et al.
Published: (2023)

INFAR: insight extraction from app reviews
by: GAO, Cuiyun, et al.
Published: (2018)

Integrating specification and programs for system modeling and verification
by: SUN, Jun, et al.
Published: (2009)