Annotating videos that teach MS Excel and predicting mouse / keyboard actions
This research paper explores the extraction of specific sentences from natural language as a foundational step towards building an Artificial Intelligence system for automating Microsoft Excel. The focus is on leveraging language models with the capability to extract intention and procedure sente...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175233 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This research paper explores the extraction of specific sentences from natural language as a
foundational step towards building an Artificial Intelligence system for automating Microsoft
Excel. The focus is on leveraging language models with the capability to extract intention and
procedure sentences from transcript collected on YouTube. Utilizing such model can
significantly alleviate the laborious process of manual annotations, and consequently, this
approach can enable us to acquire a sufficiently large dataset for training a model tailored to
the specific domain of procedure prediction.
The research methodology involves exploring the limitations of fine-tuning Flan-T5 for this
task, while also utilizing prompt engineering on Large Language Model (LLM) such as Llama
2 as an alternative method. The experimentations are conducted on Google Colab platform
which offers access up to only 15GB of VRAM.
This paper is centred around understanding the behaviour of Llama2 and how it responds
towards different prompting techniques for information extraction. Data extracted from
individual transcripts can be returned as English sentences or in a structured format, such as
JSON format. The model is then evaluated against a manually annotated dataset labelled by
human annotators for its extraction quality. This approach offers a straightforward and
accessible way to acquire large databases of structured knowledge derived from unstructured
text with very limited computational resource. |
---|