EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS

Statistical metadata is useful as a reference in planning, conducting, and evaluating a series of statistical activities. It is divided into basic, sectoral, and special statistical metadata. All three are differentiated based on the purpose and executor of its activities. Basic statistics are ca...

Full description

Saved in:

Bibliographic Details
Main Author:	Winingsih, Dahlia
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/71415
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:71415
spelling	id-itb.:714152023-02-06T15:19:05ZEXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS Winingsih, Dahlia Indonesia Theses information extraction, statistical metadata, machine learning INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/71415 Statistical metadata is useful as a reference in planning, conducting, and evaluating a series of statistical activities. It is divided into basic, sectoral, and special statistical metadata. All three are differentiated based on the purpose and executor of its activities. Basic statistics are carried out by BPS, sectoral statistics by government agencies, and specific statistics are carried out by other administrators such as private institutions and individuals. The number of specific statistical metadata collected in statistical reference system is the lowest with 388 metadata when compared with 3.613 sectoral statistical metadata. One way to obtain information related to the implementation of specific statistical activities is to search for statistical research articles which serve as media for publicity for researchers and other research organizers. However, to obtain the required information in statistical metadata from a scientific research article requires a long series of processes. The process of searching for information in a document in the form of text can be done by extracting information. The problem that arises in applying information extraction techniques to find statistical metadata information consisting of titles, organizer identities, publications, years of activity, variables, data sources and periods, units of observation, and analytical methods used in a research article is the diversity of characteristics of each information that requires different treatment to obtain the appropriate information. This study proposes a feature-based statistical metadata extraction model design obtained by applying a machine learning algorithm. The algorithms used are random forest, naïve bayes, support vector machine, and decision tree. The features used include the characteristics of text writing, layout, content, and linguistic patterns contained in words/phrases related to appropriate statistical information. The results of the model performance measurement show that the model with the random forest and decision tree algorithms has the highest average f1-score value of 0,92 while the lowest average f1-score value of 0,88 is in the naïve Bayes model. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Statistical metadata is useful as a reference in planning, conducting, and evaluating a series of statistical activities. It is divided into basic, sectoral, and special statistical metadata. All three are differentiated based on the purpose and executor of its activities. Basic statistics are carried out by BPS, sectoral statistics by government agencies, and specific statistics are carried out by other administrators such as private institutions and individuals. The number of specific statistical metadata collected in statistical reference system is the lowest with 388 metadata when compared with 3.613 sectoral statistical metadata. One way to obtain information related to the implementation of specific statistical activities is to search for statistical research articles which serve as media for publicity for researchers and other research organizers. However, to obtain the required information in statistical metadata from a scientific research article requires a long series of processes. The process of searching for information in a document in the form of text can be done by extracting information. The problem that arises in applying information extraction techniques to find statistical metadata information consisting of titles, organizer identities, publications, years of activity, variables, data sources and periods, units of observation, and analytical methods used in a research article is the diversity of characteristics of each information that requires different treatment to obtain the appropriate information. This study proposes a feature-based statistical metadata extraction model design obtained by applying a machine learning algorithm. The algorithms used are random forest, naïve bayes, support vector machine, and decision tree. The features used include the characteristics of text writing, layout, content, and linguistic patterns contained in words/phrases related to appropriate statistical information. The results of the model performance measurement show that the model with the random forest and decision tree algorithms has the highest average f1-score value of 0,92 while the lowest average f1-score value of 0,88 is in the naïve Bayes model.
format	Theses
author	Winingsih, Dahlia
spellingShingle	Winingsih, Dahlia EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
author_facet	Winingsih, Dahlia
author_sort	Winingsih, Dahlia
title	EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_short	EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_full	EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_fullStr	EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_full_unstemmed	EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS
title_sort	extraction of statistical metadata information in scientific research articles using machine learning algorithms
url	https://digilib.itb.ac.id/gdl/view/71415
_version_	1822006585742852096

EXTRACTION OF STATISTICAL METADATA INFORMATION IN SCIENTIFIC RESEARCH ARTICLES USING MACHINE LEARNING ALGORITHMS

Similar Items