INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document st...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/73921 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:73921 |
---|---|
spelling |
id-itb.:739212023-06-25T09:25:09ZINFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE Emyr Arrosyid, Reyhan Indonesia Final Project search engine, information extraction, named entity recognition, academic paper, recruitment. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/73921 Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document storage and management application equipped with a search engine called Intelligent Repository System (IRyS). The IRyS application can handle documents in general domain, academic paper domain, and recruitment domain. To enable the search of documents in IRyS based on their content, the search engine must be capable of extracting information from the documents. However, the important information contained in a document highly depends on the domain of the document. The extraction system to be built should be able to extract relevant information according to the document's domain. In this research, an information extraction system is developed to extract information from documents in general domains, as well as specifically in the domains of scientific research and recruitment. The information extraction system is constructed using several methods, including named entity recognition, rule-based methods, and machine learning. The implementation of the extraction system adopts an object-oriented approach. Based on the evaluation results, the extraction system is capable of correctly extracting most information elements and achieving an F1-score above 0.75. However, there are still some limitations, particularly in terms of recall values for the recruitment domain. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Documents play a crucial role in the exchange and storage of information.
Therefore, the management and storage of documents need to be carefully
considered. In addition to management and storage, the retrieval of stored
documents is also a common task. This research aims to develop a document storage
and management application equipped with a search engine called Intelligent
Repository System (IRyS). The IRyS application can handle documents in general
domain, academic paper domain, and recruitment domain. To enable the search of
documents in IRyS based on their content, the search engine must be capable of
extracting information from the documents. However, the important information
contained in a document highly depends on the domain of the document. The
extraction system to be built should be able to extract relevant information
according to the document's domain. In this research, an information extraction
system is developed to extract information from documents in general domains, as
well as specifically in the domains of scientific research and recruitment. The
information extraction system is constructed using several methods, including
named entity recognition, rule-based methods, and machine learning. The
implementation of the extraction system adopts an object-oriented approach. Based
on the evaluation results, the extraction system is capable of correctly extracting
most information elements and achieving an F1-score above 0.75. However, there
are still some limitations, particularly in terms of recall values for the recruitment
domain. |
format |
Final Project |
author |
Emyr Arrosyid, Reyhan |
spellingShingle |
Emyr Arrosyid, Reyhan INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE |
author_facet |
Emyr Arrosyid, Reyhan |
author_sort |
Emyr Arrosyid, Reyhan |
title |
INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE |
title_short |
INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE |
title_full |
INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE |
title_fullStr |
INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE |
title_full_unstemmed |
INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE |
title_sort |
information extraction component of the intelligent repository system (irys) search engine |
url |
https://digilib.itb.ac.id/gdl/view/73921 |
_version_ |
1822993425349214208 |