DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING

The regeneration of human resources within a company is crucial to maintain the company's operations and achieve its vision and mission. Regenerating human resources can be achieved through the recruitment of employees. However, job recruitment itself consumes a significant amount of time an...

Full description

Saved in:

Bibliographic Details
Main Author:	Hanif Raharjanto, Dwianditya
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/78150
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:78150
spelling	id-itb.:781502023-09-18T09:53:45ZDEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING Hanif Raharjanto, Dwianditya Indonesia Final Project Automatic Speech Recognition, Word Error Rate, Whisper, Wav2Vec2, Transformer. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/78150 The regeneration of human resources within a company is crucial to maintain the company's operations and achieve its vision and mission. Regenerating human resources can be achieved through the recruitment of employees. However, job recruitment itself consumes a significant amount of time and resources to find suitable candidates. This final project aims to provide a solution by combining machine and human resources to assist the company, particularly in terms of time and cost, especially during the interview process. This final project focuses on creating interview transcripts using a speech-to-text model and selecting the appropriate model for this case, either Wav2Vec2 (Wav2Vec2-XLSR-53) or Whisper (Whisper-small and Whisper-large). According to research conducted, the Whisper model performs better than Wav2Vec2. This is because Whisper is a weakly supervised model, whereas Wav2Vec2 is trained using semi-supervised methods. Additionally, the training corpus used for Whisper is larger than that of Wav2Vec2, and the Whisper model has more parameters, specifically 1.55 billion parameters compared to Wav2Vec2's 300 million parameters. Based on the experimental results, it was found that Whisper, especially Whisper-large, indeed outperforms Wav2Vec2 in terms of performance, with an accuracy represented by a Word Error Rate (WER) of 10.9% and an average processing time of 5 minutes and 23 seconds for audio durations of 5-7 minutes. In contrast, Wav2Vec2-XLSR-53 has a WER of 22.2% with a processing time of 13 minutes and 20 seconds. The model used to assist in the job interview process here is Whisper-large because it provides the required performance, which is both accurate and fast. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	The regeneration of human resources within a company is crucial to maintain the company's operations and achieve its vision and mission. Regenerating human resources can be achieved through the recruitment of employees. However, job recruitment itself consumes a significant amount of time and resources to find suitable candidates. This final project aims to provide a solution by combining machine and human resources to assist the company, particularly in terms of time and cost, especially during the interview process. This final project focuses on creating interview transcripts using a speech-to-text model and selecting the appropriate model for this case, either Wav2Vec2 (Wav2Vec2-XLSR-53) or Whisper (Whisper-small and Whisper-large). According to research conducted, the Whisper model performs better than Wav2Vec2. This is because Whisper is a weakly supervised model, whereas Wav2Vec2 is trained using semi-supervised methods. Additionally, the training corpus used for Whisper is larger than that of Wav2Vec2, and the Whisper model has more parameters, specifically 1.55 billion parameters compared to Wav2Vec2's 300 million parameters. Based on the experimental results, it was found that Whisper, especially Whisper-large, indeed outperforms Wav2Vec2 in terms of performance, with an accuracy represented by a Word Error Rate (WER) of 10.9% and an average processing time of 5 minutes and 23 seconds for audio durations of 5-7 minutes. In contrast, Wav2Vec2-XLSR-53 has a WER of 22.2% with a processing time of 13 minutes and 20 seconds. The model used to assist in the job interview process here is Whisper-large because it provides the required performance, which is both accurate and fast.
format	Final Project
author	Hanif Raharjanto, Dwianditya
spellingShingle	Hanif Raharjanto, Dwianditya DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING
author_facet	Hanif Raharjanto, Dwianditya
author_sort	Hanif Raharjanto, Dwianditya
title	DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING
title_short	DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING
title_full	DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING
title_fullStr	DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING
title_full_unstemmed	DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING
title_sort	development of a speech to text interview summarization system based on machine learning
url	https://digilib.itb.ac.id/gdl/view/78150
_version_	1822995643714502656

DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING

Similar Items