EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
Statistical metadata is useful as a reference in planning, conducting, and evaluating a series of statistical activities. It is divided into basic, sectoral, and special statistical metadata. All three are differentiated based on the purpose and executor of its activities. Basic statistics are ca...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/71415 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:71415 |
---|---|
spelling |
id-itb.:714152023-02-06T15:19:05ZEXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS Winingsih, Dahlia Indonesia Theses information extraction, statistical metadata, machine learning INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/71415 Statistical metadata is useful as a reference in planning, conducting, and evaluating a series of statistical activities. It is divided into basic, sectoral, and special statistical metadata. All three are differentiated based on the purpose and executor of its activities. Basic statistics are carried out by BPS, sectoral statistics by government agencies, and specific statistics are carried out by other administrators such as private institutions and individuals. The number of specific statistical metadata collected in statistical reference system is the lowest with 388 metadata when compared with 3.613 sectoral statistical metadata. One way to obtain information related to the implementation of specific statistical activities is to search for statistical research articles which serve as media for publicity for researchers and other research organizers. However, to obtain the required information in statistical metadata from a scientific research article requires a long series of processes. The process of searching for information in a document in the form of text can be done by extracting information. The problem that arises in applying information extraction techniques to find statistical metadata information consisting of titles, organizer identities, publications, years of activity, variables, data sources and periods, units of observation, and analytical methods used in a research article is the diversity of characteristics of each information that requires different treatment to obtain the appropriate information. This study proposes a feature-based statistical metadata extraction model design obtained by applying a machine learning algorithm. The algorithms used are random forest, naïve bayes, support vector machine, and decision tree. The features used include the characteristics of text writing, layout, content, and linguistic patterns contained in words/phrases related to appropriate statistical information. The results of the model performance measurement show that the model with the random forest and decision tree algorithms has the highest average f1-score value of 0,92 while the lowest average f1-score value of 0,88 is in the naïve Bayes model. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Statistical metadata is useful as a reference in planning, conducting, and evaluating
a series of statistical activities. It is divided into basic, sectoral, and special
statistical metadata. All three are differentiated based on the purpose and executor
of its activities. Basic statistics are carried out by BPS, sectoral statistics by
government agencies, and specific statistics are carried out by other administrators
such as private institutions and individuals. The number of specific statistical
metadata collected in statistical reference system is the lowest with 388 metadata
when compared with 3.613 sectoral statistical metadata. One way to obtain
information related to the implementation of specific statistical activities is to
search for statistical research articles which serve as media for publicity for
researchers and other research organizers. However, to obtain the required
information in statistical metadata from a scientific research article requires a long
series of processes. The process of searching for information in a document in the
form of text can be done by extracting information. The problem that arises in
applying information extraction techniques to find statistical metadata information
consisting of titles, organizer identities, publications, years of activity, variables,
data sources and periods, units of observation, and analytical methods used in a
research article is the diversity of characteristics of each information that requires
different treatment to obtain the appropriate information.
This study proposes a feature-based statistical metadata extraction model design
obtained by applying a machine learning algorithm. The algorithms used are
random forest, naïve bayes, support vector machine, and decision tree. The features
used include the characteristics of text writing, layout, content, and linguistic
patterns contained in words/phrases related to appropriate statistical information.
The results of the model performance measurement show that the model with the
random forest and decision tree algorithms has the highest average f1-score value
of 0,92 while the lowest average f1-score value of 0,88 is in the naïve Bayes model. |
format |
Theses |
author |
Winingsih, Dahlia |
spellingShingle |
Winingsih, Dahlia EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS |
author_facet |
Winingsih, Dahlia |
author_sort |
Winingsih, Dahlia |
title |
EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS |
title_short |
EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS |
title_full |
EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS |
title_fullStr |
EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS |
title_full_unstemmed |
EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS |
title_sort |
extraction of statistical metadata information in scientific research articles using machine learning algorithms |
url |
https://digilib.itb.ac.id/gdl/view/71415 |
_version_ |
1822006585742852096 |