On the multi-turn instruction following for conversational web agents

Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effe...

Full description

Saved in:
Bibliographic Details
Main Authors: DENG, Yang, ZHANG, Xuan, ZHANG, Wenxuan, YUAN, Yifei, NG, See-Kiong, CHUA, Tat-Seng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9236
https://ink.library.smu.edu.sg/context/sis_research/article/10236/viewcontent/2024.acl_long.477.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10236
record_format dspace
spelling sg-smu-ink.sis_research-102362024-09-02T06:49:07Z On the multi-turn instruction following for conversational web agents DENG, Yang ZHANG, Xuan ZHANG, Wenxuan YUAN, Yifei NG, See-Kiong CHUA, Tat-Seng Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method. 2024-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9236 https://ink.library.smu.edu.sg/context/sis_research/article/10236/viewcontent/2024.acl_long.477.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Programming Languages and Compilers
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Programming Languages and Compilers
spellingShingle Databases and Information Systems
Programming Languages and Compilers
DENG, Yang
ZHANG, Xuan
ZHANG, Wenxuan
YUAN, Yifei
NG, See-Kiong
CHUA, Tat-Seng
On the multi-turn instruction following for conversational web agents
description Web agents powered by Large Language Models (LLMs) have demonstrated remarkable abilities in planning and executing multi-step interactions within complex web-based environments, fulfilling a wide range of web navigation tasks. Despite these advancements, the potential for LLM-powered agents to effectively engage with sequential user instructions in real-world scenarios has not been fully explored. In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). To tackle the limited context length of LLMs and the context-dependency issue of the conversational tasks, we further propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques. Extensive experiments are conducted to benchmark the MT-Mind2Web dataset, and validate the effectiveness of the proposed method.
format text
author DENG, Yang
ZHANG, Xuan
ZHANG, Wenxuan
YUAN, Yifei
NG, See-Kiong
CHUA, Tat-Seng
author_facet DENG, Yang
ZHANG, Xuan
ZHANG, Wenxuan
YUAN, Yifei
NG, See-Kiong
CHUA, Tat-Seng
author_sort DENG, Yang
title On the multi-turn instruction following for conversational web agents
title_short On the multi-turn instruction following for conversational web agents
title_full On the multi-turn instruction following for conversational web agents
title_fullStr On the multi-turn instruction following for conversational web agents
title_full_unstemmed On the multi-turn instruction following for conversational web agents
title_sort on the multi-turn instruction following for conversational web agents
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9236
https://ink.library.smu.edu.sg/context/sis_research/article/10236/viewcontent/2024.acl_long.477.pdf
_version_ 1814047840922501120