INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE

Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document st...

Full description

Saved in:
Bibliographic Details
Main Author: Emyr Arrosyid, Reyhan
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/73921
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:73921
spelling id-itb.:739212023-06-25T09:25:09ZINFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE Emyr Arrosyid, Reyhan Indonesia Final Project search engine, information extraction, named entity recognition, academic paper, recruitment. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/73921 Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document storage and management application equipped with a search engine called Intelligent Repository System (IRyS). The IRyS application can handle documents in general domain, academic paper domain, and recruitment domain. To enable the search of documents in IRyS based on their content, the search engine must be capable of extracting information from the documents. However, the important information contained in a document highly depends on the domain of the document. The extraction system to be built should be able to extract relevant information according to the document's domain. In this research, an information extraction system is developed to extract information from documents in general domains, as well as specifically in the domains of scientific research and recruitment. The information extraction system is constructed using several methods, including named entity recognition, rule-based methods, and machine learning. The implementation of the extraction system adopts an object-oriented approach. Based on the evaluation results, the extraction system is capable of correctly extracting most information elements and achieving an F1-score above 0.75. However, there are still some limitations, particularly in terms of recall values for the recruitment domain. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document storage and management application equipped with a search engine called Intelligent Repository System (IRyS). The IRyS application can handle documents in general domain, academic paper domain, and recruitment domain. To enable the search of documents in IRyS based on their content, the search engine must be capable of extracting information from the documents. However, the important information contained in a document highly depends on the domain of the document. The extraction system to be built should be able to extract relevant information according to the document's domain. In this research, an information extraction system is developed to extract information from documents in general domains, as well as specifically in the domains of scientific research and recruitment. The information extraction system is constructed using several methods, including named entity recognition, rule-based methods, and machine learning. The implementation of the extraction system adopts an object-oriented approach. Based on the evaluation results, the extraction system is capable of correctly extracting most information elements and achieving an F1-score above 0.75. However, there are still some limitations, particularly in terms of recall values for the recruitment domain.
format Final Project
author Emyr Arrosyid, Reyhan
spellingShingle Emyr Arrosyid, Reyhan
INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
author_facet Emyr Arrosyid, Reyhan
author_sort Emyr Arrosyid, Reyhan
title INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_short INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_full INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_fullStr INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_full_unstemmed INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_sort information extraction component of the intelligent repository system (irys) search engine
url https://digilib.itb.ac.id/gdl/view/73921
_version_ 1822993425349214208