INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE

Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document st...

Full description

Saved in:

Bibliographic Details
Main Author:	Emyr Arrosyid, Reyhan
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/73921
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:73921
spelling	id-itb.:739212023-06-25T09:25:09ZINFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE Emyr Arrosyid, Reyhan Indonesia Final Project search engine, information extraction, named entity recognition, academic paper, recruitment. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/73921 Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document storage and management application equipped with a search engine called Intelligent Repository System (IRyS). The IRyS application can handle documents in general domain, academic paper domain, and recruitment domain. To enable the search of documents in IRyS based on their content, the search engine must be capable of extracting information from the documents. However, the important information contained in a document highly depends on the domain of the document. The extraction system to be built should be able to extract relevant information according to the document's domain. In this research, an information extraction system is developed to extract information from documents in general domains, as well as specifically in the domains of scientific research and recruitment. The information extraction system is constructed using several methods, including named entity recognition, rule-based methods, and machine learning. The implementation of the extraction system adopts an object-oriented approach. Based on the evaluation results, the extraction system is capable of correctly extracting most information elements and achieving an F1-score above 0.75. However, there are still some limitations, particularly in terms of recall values for the recruitment domain. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Documents play a crucial role in the exchange and storage of information. Therefore, the management and storage of documents need to be carefully considered. In addition to management and storage, the retrieval of stored documents is also a common task. This research aims to develop a document storage and management application equipped with a search engine called Intelligent Repository System (IRyS). The IRyS application can handle documents in general domain, academic paper domain, and recruitment domain. To enable the search of documents in IRyS based on their content, the search engine must be capable of extracting information from the documents. However, the important information contained in a document highly depends on the domain of the document. The extraction system to be built should be able to extract relevant information according to the document's domain. In this research, an information extraction system is developed to extract information from documents in general domains, as well as specifically in the domains of scientific research and recruitment. The information extraction system is constructed using several methods, including named entity recognition, rule-based methods, and machine learning. The implementation of the extraction system adopts an object-oriented approach. Based on the evaluation results, the extraction system is capable of correctly extracting most information elements and achieving an F1-score above 0.75. However, there are still some limitations, particularly in terms of recall values for the recruitment domain.
format	Final Project
author	Emyr Arrosyid, Reyhan
spellingShingle	Emyr Arrosyid, Reyhan INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
author_facet	Emyr Arrosyid, Reyhan
author_sort	Emyr Arrosyid, Reyhan
title	INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_short	INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_full	INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_fullStr	INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_full_unstemmed	INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE
title_sort	information extraction component of the intelligent repository system (irys) search engine
url	https://digilib.itb.ac.id/gdl/view/73921
_version_	1822993425349214208

INFORMATION EXTRACTION COMPONENT OF THE INTELLIGENT REPOSITORY SYSTEM (IRYS) SEARCH ENGINE

Similar Items