KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA

Knowledge graph is one of the semantic web technologies that can be used for entity recognition in text, graph visualization, and even to improve business process e.g. information retrieval for product searching in E-commerce. One of information sources for building knowledge graph is text data t...

Full description

Saved in:

Bibliographic Details
Main Author:	Aprilia Josephine, Deborah
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/50248
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:50248
spelling	id-itb.:502482020-09-23T10:22:01ZKNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA Aprilia Josephine, Deborah Indonesia Final Project knowledge graph, entity, relation, model. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/50248 Knowledge graph is one of the semantic web technologies that can be used for entity recognition in text, graph visualization, and even to improve business process e.g. information retrieval for product searching in E-commerce. One of information sources for building knowledge graph is text data that is available in many digital domain such as E-commerce platform. For now, E-commerce sites don’t have any structured products information available in the platform. Therefore, E-commerce is dependent to the seller’s information. On the other hand, the usage of structured information, that can be represented in knowledge graph, may improve the business process in E-commerce. There is a need to do a research about knowledge graph construction with the source of product text available in E-commerce sites written in Bahasa Indonesia. The structured entity in knowledge graph is extracted from the product text with the use of NLP tools. Technique by using transfer learning with full-fine tuning from an existing trained model is chosen in order to recognize the entities by considering the limitation of labeled data. Based on observation, there are some English terms in the product texts, so we need model that can be used for multilingual texts. We used multilingual pre-trained model with Transformer Architecture i.e. multilingual-BERT-base-cased and XLM-RoBERTa-base. The extracted entities is mapped into the knowledge graph by adopting T2KG framework components i.e. entity mapping and triple integration. In entity mapping component, the extracted entities having many aliases is being mapped into a unique entity. At this point, the mapped entities should be connected to one another in a relation. Considering there is no expressed relations in the product texts, we use ontology, a part of semantic web, to define the relation between entities with reference from Schema.org. The mapped entities and ontology is being integrated in triple integration component. The goal of information extraction experiment is to achieve a robust model that can be used to unseen product with unseen brand from different e-commerce platform. The train data contains 1.500 labeled product texts, while the test data contains 216 labeled product texts conducted in three versions of test data and four test scenarios. The result showed that XLM-RoBERTa model performed better than multilingual-BERT with average F1-score 0,895. Knowledge graph mapping from the extracted information was evaluated manually with 1.445 product texts from two E-commerce platform, resulting in 338 entities formed in the knowledge graph with mapping precision 0,94. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Knowledge graph is one of the semantic web technologies that can be used for entity recognition in text, graph visualization, and even to improve business process e.g. information retrieval for product searching in E-commerce. One of information sources for building knowledge graph is text data that is available in many digital domain such as E-commerce platform. For now, E-commerce sites don’t have any structured products information available in the platform. Therefore, E-commerce is dependent to the seller’s information. On the other hand, the usage of structured information, that can be represented in knowledge graph, may improve the business process in E-commerce. There is a need to do a research about knowledge graph construction with the source of product text available in E-commerce sites written in Bahasa Indonesia. The structured entity in knowledge graph is extracted from the product text with the use of NLP tools. Technique by using transfer learning with full-fine tuning from an existing trained model is chosen in order to recognize the entities by considering the limitation of labeled data. Based on observation, there are some English terms in the product texts, so we need model that can be used for multilingual texts. We used multilingual pre-trained model with Transformer Architecture i.e. multilingual-BERT-base-cased and XLM-RoBERTa-base. The extracted entities is mapped into the knowledge graph by adopting T2KG framework components i.e. entity mapping and triple integration. In entity mapping component, the extracted entities having many aliases is being mapped into a unique entity. At this point, the mapped entities should be connected to one another in a relation. Considering there is no expressed relations in the product texts, we use ontology, a part of semantic web, to define the relation between entities with reference from Schema.org. The mapped entities and ontology is being integrated in triple integration component. The goal of information extraction experiment is to achieve a robust model that can be used to unseen product with unseen brand from different e-commerce platform. The train data contains 1.500 labeled product texts, while the test data contains 216 labeled product texts conducted in three versions of test data and four test scenarios. The result showed that XLM-RoBERTa model performed better than multilingual-BERT with average F1-score 0,895. Knowledge graph mapping from the extracted information was evaluated manually with 1.445 product texts from two E-commerce platform, resulting in 338 entities formed in the knowledge graph with mapping precision 0,94.
format	Final Project
author	Aprilia Josephine, Deborah
spellingShingle	Aprilia Josephine, Deborah KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA
author_facet	Aprilia Josephine, Deborah
author_sort	Aprilia Josephine, Deborah
title	KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA
title_short	KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA
title_full	KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA
title_fullStr	KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA
title_full_unstemmed	KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA
title_sort	knowledge graph construction using information extraction of indonesia cosmetic product text in bahasa indonesia
url	https://digilib.itb.ac.id/gdl/view/50248
_version_	1822272299594678272

KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA

Similar Items