Towards general conceptual model editing via adversarial representation engineering

Since the rapid development of Large Language Models (LLMs) has achieved remarkable success, understanding and rectifying their internal complex mechanisms has become an urgent issue. Recent research has attempted to interpret their behaviors through the lens of inner representation. However, develo...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: ZHANG, Yihao, WEI, Zeming, SUN, Jun, SUN, Meng
التنسيق: text
اللغة:English
منشور في: Institutional Knowledge at Singapore Management University 2024
الموضوعات:
الوصول للمادة أونلاين:https://ink.library.smu.edu.sg/sis_research/9833
https://ink.library.smu.edu.sg/context/sis_research/article/10833/viewcontent/2404.13752v3.pdf
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!