INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE

Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document st...

Full description

Saved in:
Bibliographic Details
Main Author: Emyr Arrosyid, Reyhan
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/73921
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document storage and management application equipped with a search engine called Intelligent Repository System (IRyS). The IRyS application can handle documents in general domain, academic paper domain, and recruitment domain. To enable the search of documents in IRyS based on their content, the search engine must be capable of extracting information from the documents. However, the important information contained in a document highly depends on the domain of the document. The extraction system to be built should be able to extract relevant information according to the document's domain. In this research, an information extraction system is developed to extract information from documents in general domains, as well as specifically in the domains of scientific research and recruitment. The information extraction system is constructed using several methods, including named entity recognition, rule-based methods, and machine learning. The implementation of the extraction system adopts an object-oriented approach. Based on the evaluation results, the extraction system is capable of correctly extracting most information elements and achieving an F1-score above 0.75. However, there are still some limitations, particularly in terms of recall values for the recruitment domain.