Collecting and annotating videos that teach MS PowerPoint

The central aim of this project is to generate a comprehensive dataset for training an artificial intelligence (AI) that is able to operate Microsoft PowerPoint autonomously. This project encompasses several different phases: Starting with the identification of videos that teach Microsoft PowerPo...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Isaac Jun Hong
Other Authors: Li Boyang
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171932
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The central aim of this project is to generate a comprehensive dataset for training an artificial intelligence (AI) that is able to operate Microsoft PowerPoint autonomously. This project encompasses several different phases: Starting with the identification of videos that teach Microsoft PowerPoint following which we will download the identified videos using Jupyter Notebook with the help of the Pytube library. This is followed by the transcribing of videos that lack closed captions with the Whisper Model. Following this, the annotation process is then executed whereby the keystroke and the mouse clicks are then labeled using Sequence labeling in Doccano. The project then transits into the model training phase where both T5 and FLAN-T5 neural network models are experimented on for their ability to interpret and translate narrated instructions into corresponding mouse and keyboard actions to decide which model would achieve the better performance. Given the limitations of YouTube’s dataset, data augmentation techniques were employed using ChatGPT to improve model training.