Annotating videos that teach MS Excel and predicting mouse / keyboard actions

This research paper explores the extraction of specific sentences from natural language as a foundational step towards building an Artificial Intelligence system for automating Microsoft Excel. The focus is on leveraging language models with the capability to extract intention and procedure sente...

全面介紹

Saved in:
書目詳細資料
主要作者: Tan, Genson Yao Jie
其他作者: Li Boyang
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2024
主題:
在線閱讀:https://hdl.handle.net/10356/175233
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:This research paper explores the extraction of specific sentences from natural language as a foundational step towards building an Artificial Intelligence system for automating Microsoft Excel. The focus is on leveraging language models with the capability to extract intention and procedure sentences from transcript collected on YouTube. Utilizing such model can significantly alleviate the laborious process of manual annotations, and consequently, this approach can enable us to acquire a sufficiently large dataset for training a model tailored to the specific domain of procedure prediction. The research methodology involves exploring the limitations of fine-tuning Flan-T5 for this task, while also utilizing prompt engineering on Large Language Model (LLM) such as Llama 2 as an alternative method. The experimentations are conducted on Google Colab platform which offers access up to only 15GB of VRAM. This paper is centred around understanding the behaviour of Llama2 and how it responds towards different prompting techniques for information extraction. Data extracted from individual transcripts can be returned as English sentences or in a structured format, such as JSON format. The model is then evaluated against a manually annotated dataset labelled by human annotators for its extraction quality. This approach offers a straightforward and accessible way to acquire large databases of structured knowledge derived from unstructured text with very limited computational resource.