Information extraction for elegislation

Information extraction (IE) is the process of transforming unstructured information of documents into a structured database of structured information. This technology allowed more narrowed-down search results of documents stored in Document Management System (DMS). An IE system was developed to augm...

Full description

Saved in:
Bibliographic Details
Main Authors: Lim, Brian Kent, Miranda, Angelo Crisanto, Trogo, Janine, Yap, Fe Eleanor
Format: text
Language:English
Published: Animo Repository 2010
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/11062
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_bachelors-11707
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_bachelors-117072022-03-01T01:40:31Z Information extraction for elegislation Lim, Brian Kent Miranda, Angelo Crisanto Trogo, Janine Yap, Fe Eleanor Information extraction (IE) is the process of transforming unstructured information of documents into a structured database of structured information. This technology allowed more narrowed-down search results of documents stored in Document Management System (DMS). An IE system was developed to augment a Blue Ribbon Committee (BRC) DMS for the eParticipation Project. IE architectures were studied and related tools were identified to develop the IE system specifically for the BRC. The IE System is composed of 7 minor modules namely Sentence Splitter, Tokenizer, Cross Reference, Part of Speech Tagger, Unknown Word, Named Entity Recognition and Preparser, 3 major modules which are Semantic Tagger, CoReference Resolution and Preparser, 3 major modules which are Semantic Tagger, CoReference Resolution and Template Filler, and 2 external modules which are Search and Evaluation modules. With the help and constant communication with the Blue Ribbon Committee, the research was able to gather documents that helped in creating the system. Also, the output is already created and extracted based on the preference of the client and that the output system is already meeting the standards requested by the Blue Ribbon Committee. Overall, the system showed favorable results in the actual testing phase which had an output of 95.42%, but when the initial format of the documents were followed, the result of the system would be 100% accurate. Upon presenting the system to the main stakeholders, they remarked that what they had seen was already beyond their expectations and they were very pleased about the outcome. There are still parts of the system which could be improved on, such as train the values of the POS Tagger and the Named Entity Recognition from the documents being fed, update the library used to open word document files, add documents and templates to the system's process, add image recognition to the system, update web crawler for more sources and improve the search ranking algorithm. 2010-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_bachelors/11062 Bachelor's Theses English Animo Repository Text processing (Computer science) Natural language processing (Computer science) Database management Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Text processing (Computer science)
Natural language processing (Computer science)
Database management
Computer Sciences
spellingShingle Text processing (Computer science)
Natural language processing (Computer science)
Database management
Computer Sciences
Lim, Brian Kent
Miranda, Angelo Crisanto
Trogo, Janine
Yap, Fe Eleanor
Information extraction for elegislation
description Information extraction (IE) is the process of transforming unstructured information of documents into a structured database of structured information. This technology allowed more narrowed-down search results of documents stored in Document Management System (DMS). An IE system was developed to augment a Blue Ribbon Committee (BRC) DMS for the eParticipation Project. IE architectures were studied and related tools were identified to develop the IE system specifically for the BRC. The IE System is composed of 7 minor modules namely Sentence Splitter, Tokenizer, Cross Reference, Part of Speech Tagger, Unknown Word, Named Entity Recognition and Preparser, 3 major modules which are Semantic Tagger, CoReference Resolution and Preparser, 3 major modules which are Semantic Tagger, CoReference Resolution and Template Filler, and 2 external modules which are Search and Evaluation modules. With the help and constant communication with the Blue Ribbon Committee, the research was able to gather documents that helped in creating the system. Also, the output is already created and extracted based on the preference of the client and that the output system is already meeting the standards requested by the Blue Ribbon Committee. Overall, the system showed favorable results in the actual testing phase which had an output of 95.42%, but when the initial format of the documents were followed, the result of the system would be 100% accurate. Upon presenting the system to the main stakeholders, they remarked that what they had seen was already beyond their expectations and they were very pleased about the outcome. There are still parts of the system which could be improved on, such as train the values of the POS Tagger and the Named Entity Recognition from the documents being fed, update the library used to open word document files, add documents and templates to the system's process, add image recognition to the system, update web crawler for more sources and improve the search ranking algorithm.
format text
author Lim, Brian Kent
Miranda, Angelo Crisanto
Trogo, Janine
Yap, Fe Eleanor
author_facet Lim, Brian Kent
Miranda, Angelo Crisanto
Trogo, Janine
Yap, Fe Eleanor
author_sort Lim, Brian Kent
title Information extraction for elegislation
title_short Information extraction for elegislation
title_full Information extraction for elegislation
title_fullStr Information extraction for elegislation
title_full_unstemmed Information extraction for elegislation
title_sort information extraction for elegislation
publisher Animo Repository
publishDate 2010
url https://animorepository.dlsu.edu.ph/etd_bachelors/11062
_version_ 1726158580054228992