Attack prompt generation for red teaming and defending large language models

Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content. Previous research constructs attack prompts via manual or automatic methods, which have their own limitations on construction cost and quality. To address these issues, we propose...

Full description

Saved in:
Bibliographic Details
Main Authors: DENG, Boyi, WANG, Wenjie, FENG, Fuli, DENG, Yang, WANG, Qifan, HE, Xiangnan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9118
https://ink.library.smu.edu.sg/context/sis_research/article/10121/viewcontent/2023.findings_emnlp.143.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English