Attack prompt generation for red teaming and defending large language models

Attack prompt generation for red teaming and defending large language models

Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content. Previous research constructs attack prompts via manual or automatic methods, which have their own limitations on construction cost and quality. To address these issues, we propose...

Full description

Saved in:

Bibliographic Details
Main Authors:	DENG, Boyi, WANG, Wenjie, FENG, Fuli, DENG, Yang, WANG, Qifan, HE, Xiangnan
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Programming Languages and Compilers
Online Access:	https://ink.library.smu.edu.sg/sis_research/9118 https://ink.library.smu.edu.sg/context/sis_research/article/10121/viewcontent/2023.findings_emnlp.143.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Position-guided text prompt for vision-language pre-training
by: WANG, Alex Jinpeng, et al.
Published: (2023)

Large language models as source planner for personalized knowledge-grounded dialogues
by: WANG, Hongru, et al.
Published: (2023)

Plug-and-play policy planner for large language model powered dialogue agents
by: DENG, Yang, et al.
Published: (2024)

Self-chats from large language models make small emotional support chatbot better
by: ZHENG, Zhonghua, et al.
Published: (2024)

CLAMBER: A benchmark of identifying and clarifying ambiguous information needs in large language models
by: ZHANG, Tong, et al.
Published: (2024)

STYLE: Improving domain transferability of asking clarification questions in large language model powered conversational agents
by: CHEN, Yue, et al.
Published: (2024)

Towards LLM-based fact verification on news claims with a hierarchical step-by-step prompting method
by: ZHANG, Xuan, et al.
Published: (2023)

Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models
by: WANG, Lei, et al.
Published: (2023)

A comprehensive evaluation of large language models on legal judgment prediction
by: SHUI, Ruihao, et al.
Published: (2023)

Large language model for vulnerability detection: Emerging results and future directions
by: ZHOU, Xin, et al.
Published: (2024)

ReEvo: Large language models as hyper-heuristics with reflective evolution
by: YE, Haoran, et al.
Published: (2024)

Reinforcement tuning for detecting stances and debunking rumors jointly with large language models
by: YANG, Ruichao, et al.
Published: (2024)

Robust prompt optimization for large language models against distribution shifts
by: LI, Moxin, et al.
Published: (2023)

LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models
by: HU, Zhiqiang, et al.
Published: (2023)

A Prolog-based definition of an entity-relationship language
by: CHAN, H., et al.
Published: (1993)

Revisiting masked auto-encoders for ECG-language representation learning
by: PHAM, Hung Manh, et al.
Published: (2024)

Examining the Inter-consistency of large language models: An in-depth analysis via debate
by: XIONG, Kai, et al.
Published: (2023)

Program evaluation for Easy, C, Pascal programming languages
by: Garcia, Arnaldo, et al.
Published: (1989)

Large language model powered agents in the web
by: DENG, Yang, et al.
Published: (2024)

Let’s think outside the box: Exploring leap-of-thought in large language models with multimodal humor generation
by: ZHONG, Shanshan, et al.
Published: (2024)

Large language model powered agents for information retrieval
by: ZHANG, An, et al.
Published: (2024)

Benchmarking foundation models with language-model-as-an-examiner
by: BAI, Yushi, et al.
Published: (2023)

Defending against redirect attacks in mobile IP
by: DENG, Robert H., et al.
Published: (2002)

Greening large language models of code
by: SHI, Jieke, et al.
Published: (2024)

Enhancing visual grounding in vision-language pre-training with position-guided text prompts
by: WANG, Alex Jinpeng, et al.
Published: (2024)

A verification system for interval-based specification languages
by: CHEN, Chunqing, et al.
Published: (2010)

A black-box attack on code models via representation nearest Neighbor search
by: ZHANG, Jie, et al.
Published: (2023)

Defending large language models against jailbreak attacks via layer-specific editing
by: ZHAO, Wei, et al.
Published: (2024)

A case study on automated fuzz target generation for large codebases
by: KELLY, Matthew, et al.
Published: (2019)

Natural language processing in the legal domain
by: KATZ, Daniel Martin, et al.
Published: (2023)

Unified modeling language: A complexity analysis
by: SIAU, Keng, et al.
Published: (2001)

MolCA: Molecular graph-language modeling with cross-modal projector and uni-modal adapter
by: LIU, Zhiyuan, et al.
Published: (2023)

Disentangling transformer language models as superposed topic models
by: LIM, Jia Peng, et al.
Published: (2023)

Visual knowledge query language and its translation to SQL
by: SIAU, Keng, et al.
Published: (1991)

Large language model is not a good few-shot information extractor, but a good reranker for hard samples!
by: MA, Yubo, et al.
Published: (2023)

Contrastive learning approach to word-in-context task for low-resource languages
by: LO, Pei-Chi, et al.
Published: (2023)

VLStereoSet: A study of stereotypical bias in pre-trained vision-language models
by: ZHOU, Kankan, et al.
Published: (2022)

A comparison of YAPLU against the C programming language
by: Ng, Spencer, et al.
Published: (2011)

WatME: Towards lossless watermarking through lexical redundancy
by: CHEN, Liang, et al.
Published: (2024)

Laughter emotion recognition using gestures
by: De Jesus, Paulina Catya S.
Published: (2014)