DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING
The regeneration of human resources within a company is crucial to maintain the company's operations and achieve its vision and mission. Regenerating human resources can be achieved through the recruitment of employees. However, job recruitment itself consumes a significant amount of time an...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/78150 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:78150 |
---|---|
spelling |
id-itb.:781502023-09-18T09:53:45ZDEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING Hanif Raharjanto, Dwianditya Indonesia Final Project Automatic Speech Recognition, Word Error Rate, Whisper, Wav2Vec2, Transformer. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/78150 The regeneration of human resources within a company is crucial to maintain the company's operations and achieve its vision and mission. Regenerating human resources can be achieved through the recruitment of employees. However, job recruitment itself consumes a significant amount of time and resources to find suitable candidates. This final project aims to provide a solution by combining machine and human resources to assist the company, particularly in terms of time and cost, especially during the interview process. This final project focuses on creating interview transcripts using a speech-to-text model and selecting the appropriate model for this case, either Wav2Vec2 (Wav2Vec2-XLSR-53) or Whisper (Whisper-small and Whisper-large). According to research conducted, the Whisper model performs better than Wav2Vec2. This is because Whisper is a weakly supervised model, whereas Wav2Vec2 is trained using semi-supervised methods. Additionally, the training corpus used for Whisper is larger than that of Wav2Vec2, and the Whisper model has more parameters, specifically 1.55 billion parameters compared to Wav2Vec2's 300 million parameters. Based on the experimental results, it was found that Whisper, especially Whisper-large, indeed outperforms Wav2Vec2 in terms of performance, with an accuracy represented by a Word Error Rate (WER) of 10.9% and an average processing time of 5 minutes and 23 seconds for audio durations of 5-7 minutes. In contrast, Wav2Vec2-XLSR-53 has a WER of 22.2% with a processing time of 13 minutes and 20 seconds. The model used to assist in the job interview process here is Whisper-large because it provides the required performance, which is both accurate and fast. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
The regeneration of human resources within a company is crucial to maintain the
company's operations and achieve its vision and mission. Regenerating human
resources can be achieved through the recruitment of employees. However, job
recruitment itself consumes a significant amount of time and resources to find
suitable candidates. This final project aims to provide a solution by combining
machine and human resources to assist the company, particularly in terms of time
and cost, especially during the interview process. This final project focuses on
creating interview transcripts using a speech-to-text model and selecting the
appropriate model for this case, either Wav2Vec2 (Wav2Vec2-XLSR-53) or
Whisper (Whisper-small and Whisper-large).
According to research conducted, the Whisper model performs better than
Wav2Vec2. This is because Whisper is a weakly supervised model, whereas
Wav2Vec2 is trained using semi-supervised methods. Additionally, the training
corpus used for Whisper is larger than that of Wav2Vec2, and the Whisper model
has more parameters, specifically 1.55 billion parameters compared to
Wav2Vec2's 300 million parameters.
Based on the experimental results, it was found that Whisper, especially
Whisper-large, indeed outperforms Wav2Vec2 in terms of performance, with an
accuracy represented by a Word Error Rate (WER) of 10.9% and an average
processing time of 5 minutes and 23 seconds for audio durations of 5-7 minutes.
In contrast, Wav2Vec2-XLSR-53 has a WER of 22.2% with a processing time of
13 minutes and 20 seconds.
The model used to assist in the job interview process here is Whisper-large
because it provides the required performance, which is both accurate and fast. |
format |
Final Project |
author |
Hanif Raharjanto, Dwianditya |
spellingShingle |
Hanif Raharjanto, Dwianditya DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING |
author_facet |
Hanif Raharjanto, Dwianditya |
author_sort |
Hanif Raharjanto, Dwianditya |
title |
DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING |
title_short |
DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING |
title_full |
DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING |
title_fullStr |
DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING |
title_full_unstemmed |
DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING |
title_sort |
development of a speech to text interview summarization system based on machine learning |
url |
https://digilib.itb.ac.id/gdl/view/78150 |
_version_ |
1822995643714502656 |