Collecting and annotating videos that teach MS PowerPoint

The central aim of this project is to generate a comprehensive dataset for training an artificial intelligence (AI) that is able to operate Microsoft PowerPoint autonomously. This project encompasses several different phases: Starting with the identification of videos that teach Microsoft PowerPo...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Tan, Isaac Jun Hong
مؤلفون آخرون: Li Boyang
التنسيق: Final Year Project
اللغة:English
منشور في: Nanyang Technological University 2023
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/171932
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:The central aim of this project is to generate a comprehensive dataset for training an artificial intelligence (AI) that is able to operate Microsoft PowerPoint autonomously. This project encompasses several different phases: Starting with the identification of videos that teach Microsoft PowerPoint following which we will download the identified videos using Jupyter Notebook with the help of the Pytube library. This is followed by the transcribing of videos that lack closed captions with the Whisper Model. Following this, the annotation process is then executed whereby the keystroke and the mouse clicks are then labeled using Sequence labeling in Doccano. The project then transits into the model training phase where both T5 and FLAN-T5 neural network models are experimented on for their ability to interpret and translate narrated instructions into corresponding mouse and keyboard actions to decide which model would achieve the better performance. Given the limitations of YouTube’s dataset, data augmentation techniques were employed using ChatGPT to improve model training.