EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS

Statistical metadata is useful as a reference in planning, conducting, and evaluating a series of statistical activities. It is divided into basic, sectoral, and special statistical metadata. All three are differentiated based on the purpose and executor of its activities. Basic statistics are ca...

Full description

Saved in:
Bibliographic Details
Main Author: Winingsih, Dahlia
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/71415
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:71415
spelling id-itb.:714152023-02-06T15:19:05ZEXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS Winingsih, Dahlia Indonesia Theses information extraction, statistical metadata, machine learning INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/71415 Statistical metadata is useful as a reference in planning, conducting, and evaluating a series of statistical activities. It is divided into basic, sectoral, and special statistical metadata. All three are differentiated based on the purpose and executor of its activities. Basic statistics are carried out by BPS, sectoral statistics by government agencies, and specific statistics are carried out by other administrators such as private institutions and individuals. The number of specific statistical metadata collected in statistical reference system is the lowest with 388 metadata when compared with 3.613 sectoral statistical metadata. One way to obtain information related to the implementation of specific statistical activities is to search for statistical research articles which serve as media for publicity for researchers and other research organizers. However, to obtain the required information in statistical metadata from a scientific research article requires a long series of processes. The process of searching for information in a document in the form of text can be done by extracting information. The problem that arises in applying information extraction techniques to find statistical metadata information consisting of titles, organizer identities, publications, years of activity, variables, data sources and periods, units of observation, and analytical methods used in a research article is the diversity of characteristics of each information that requires different treatment to obtain the appropriate information. This study proposes a feature-based statistical metadata extraction model design obtained by applying a machine learning algorithm. The algorithms used are random forest, naïve bayes, support vector machine, and decision tree. The features used include the characteristics of text writing, layout, content, and linguistic patterns contained in words/phrases related to appropriate statistical information. The results of the model performance measurement show that the model with the random forest and decision tree algorithms has the highest average f1-score value of 0,92 while the lowest average f1-score value of 0,88 is in the naïve Bayes model. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Statistical metadata is useful as a reference in planning, conducting, and evaluating a series of statistical activities. It is divided into basic, sectoral, and special statistical metadata. All three are differentiated based on the purpose and executor of its activities. Basic statistics are carried out by BPS, sectoral statistics by government agencies, and specific statistics are carried out by other administrators such as private institutions and individuals. The number of specific statistical metadata collected in statistical reference system is the lowest with 388 metadata when compared with 3.613 sectoral statistical metadata. One way to obtain information related to the implementation of specific statistical activities is to search for statistical research articles which serve as media for publicity for researchers and other research organizers. However, to obtain the required information in statistical metadata from a scientific research article requires a long series of processes. The process of searching for information in a document in the form of text can be done by extracting information. The problem that arises in applying information extraction techniques to find statistical metadata information consisting of titles, organizer identities, publications, years of activity, variables, data sources and periods, units of observation, and analytical methods used in a research article is the diversity of characteristics of each information that requires different treatment to obtain the appropriate information. This study proposes a feature-based statistical metadata extraction model design obtained by applying a machine learning algorithm. The algorithms used are random forest, naïve bayes, support vector machine, and decision tree. The features used include the characteristics of text writing, layout, content, and linguistic patterns contained in words/phrases related to appropriate statistical information. The results of the model performance measurement show that the model with the random forest and decision tree algorithms has the highest average f1-score value of 0,92 while the lowest average f1-score value of 0,88 is in the naïve Bayes model.
format Theses
author Winingsih, Dahlia
spellingShingle Winingsih, Dahlia
EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
author_facet Winingsih, Dahlia
author_sort Winingsih, Dahlia
title EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_short EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_full EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_fullStr EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_full_unstemmed EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_sort extraction of statistical metadata information in scientific research articles using machine learning algorithms
url https://digilib.itb.ac.id/gdl/view/71415
_version_ 1822006585742852096