On the multi-turn instruction following for conversational web agents

Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effe...

Full description

Saved in:

Bibliographic Details
Main Authors:	DENG, Yang, ZHANG, Xuan, ZHANG, Wenxuan, YUAN, Yifei, NG, See-Kiong, CHUA, Tat-Seng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Databases and Information Systems Programming Languages and Compilers
Online Access:	https://ink.library.smu.edu.sg/sis_research/9236 https://ink.library.smu.edu.sg/context/sis_research/article/10236/viewcontent/2024.acl_long.477.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10236
record_format	dspace
spelling	sg-smu-ink.sis_research-102362024-09-02T06:49:07Z On the multi-turn instruction following for conversational web agents DENG, Yang ZHANG, Xuan ZHANG, Wenxuan YUAN, Yifei NG, See-Kiong CHUA, Tat-Seng Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method. 2024-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9236 https://ink.library.smu.edu.sg/context/sis_research/article/10236/viewcontent/2024.acl_long.477.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Programming Languages and Compilers
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems Programming Languages and Compilers
spellingShingle	Databases and Information Systems Programming Languages and Compilers DENG, Yang ZHANG, Xuan ZHANG, Wenxuan YUAN, Yifei NG, See-Kiong CHUA, Tat-Seng On the multi-turn instruction following for conversational web agents
description	Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method.
format	text
author	DENG, Yang ZHANG, Xuan ZHANG, Wenxuan YUAN, Yifei NG, See-Kiong CHUA, Tat-Seng
author_facet	DENG, Yang ZHANG, Xuan ZHANG, Wenxuan YUAN, Yifei NG, See-Kiong CHUA, Tat-Seng
author_sort	DENG, Yang
title	On the multi-turn instruction following for conversational web agents
title_short	On the multi-turn instruction following for conversational web agents
title_full	On the multi-turn instruction following for conversational web agents
title_fullStr	On the multi-turn instruction following for conversational web agents
title_full_unstemmed	On the multi-turn instruction following for conversational web agents
title_sort	on the multi-turn instruction following for conversational web agents
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9236 https://ink.library.smu.edu.sg/context/sis_research/article/10236/viewcontent/2024.acl_long.477.pdf
_version_	1814047840922501120

On the multi-turn instruction following for conversational web agents

Similar Items