Attack prompt generation for red teaming and defending large language models
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content. Previous research constructs attack prompts via manual or automatic methods, which have their own limitations on construction cost and quality. To address these issues, we propose...
Saved in:
Main Authors: | DENG, Boyi, WANG, Wenjie, FENG, Fuli, DENG, Yang, WANG, Qifan, HE, Xiangnan |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2023
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9118 https://ink.library.smu.edu.sg/context/sis_research/article/10121/viewcontent/2023.findings_emnlp.143.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
Position-guided text prompt for vision-language pre-training
by: WANG, Alex Jinpeng, et al.
Published: (2023) -
Large language models as source planner for personalized knowledge-grounded dialogues
by: WANG, Hongru, et al.
Published: (2023) -
Plug-and-play policy planner for large language model powered dialogue agents
by: DENG, Yang, et al.
Published: (2024) -
Self-chats from large language models make small emotional support chatbot better
by: ZHENG, Zhonghua, et al.
Published: (2024) -
CLAMBER: A benchmark of identifying and clarifying ambiguous information needs in large language models
by: ZHANG, Tong, et al.
Published: (2024)