DEVELOPMENT OF A SPEECH TO TEXT INTERVIEW SUMMARIZATION SYSTEM BASED ON MACHINE LEARNING
The regeneration of human resources within a company is crucial to maintain the company's operations and achieve its vision and mission. Regenerating human resources can be achieved through the recruitment of employees. However, job recruitment itself consumes a significant amount of time an...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/78150 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The regeneration of human resources within a company is crucial to maintain the
company's operations and achieve its vision and mission. Regenerating human
resources can be achieved through the recruitment of employees. However, job
recruitment itself consumes a significant amount of time and resources to find
suitable candidates. This final project aims to provide a solution by combining
machine and human resources to assist the company, particularly in terms of time
and cost, especially during the interview process. This final project focuses on
creating interview transcripts using a speech-to-text model and selecting the
appropriate model for this case, either Wav2Vec2 (Wav2Vec2-XLSR-53) or
Whisper (Whisper-small and Whisper-large).
According to research conducted, the Whisper model performs better than
Wav2Vec2. This is because Whisper is a weakly supervised model, whereas
Wav2Vec2 is trained using semi-supervised methods. Additionally, the training
corpus used for Whisper is larger than that of Wav2Vec2, and the Whisper model
has more parameters, specifically 1.55 billion parameters compared to
Wav2Vec2's 300 million parameters.
Based on the experimental results, it was found that Whisper, especially
Whisper-large, indeed outperforms Wav2Vec2 in terms of performance, with an
accuracy represented by a Word Error Rate (WER) of 10.9% and an average
processing time of 5 minutes and 23 seconds for audio durations of 5-7 minutes.
In contrast, Wav2Vec2-XLSR-53 has a WER of 22.2% with a processing time of
13 minutes and 20 seconds.
The model used to assist in the job interview process here is Whisper-large
because it provides the required performance, which is both accurate and fast. |
---|