Towards general conceptual model editing via adversarial representation engineering

Since the rapid development of Large Language Models (LLMs) has achieved remarkable success, understanding and rectifying their internal complex mechanisms has become an urgent issue. Recent research has attempted to interpret their behaviors through the lens of inner representation. However, develo...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG, Yihao, WEI, Zeming, SUN, Jun, SUN, Meng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9833
https://ink.library.smu.edu.sg/context/sis_research/article/10833/viewcontent/2404.13752v3.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English

Similar Items