KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA

Knowledge graph is one of the semantic web technologies that can be used for entity recognition in text, graph visualization, and even to improve business process e.g. information retrieval for product searching in E-commerce. One of information sources for building knowledge graph is text data t...

Full description

Saved in:
Bibliographic Details
Main Author: Aprilia Josephine, Deborah
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/50248
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Knowledge graph is one of the semantic web technologies that can be used for entity recognition in text, graph visualization, and even to improve business process e.g. information retrieval for product searching in E-commerce. One of information sources for building knowledge graph is text data that is available in many digital domain such as E-commerce platform. For now, E-commerce sites don’t have any structured products information available in the platform. Therefore, E-commerce is dependent to the seller’s information. On the other hand, the usage of structured information, that can be represented in knowledge graph, may improve the business process in E-commerce. There is a need to do a research about knowledge graph construction with the source of product text available in E-commerce sites written in Bahasa Indonesia. The structured entity in knowledge graph is extracted from the product text with the use of NLP tools. Technique by using transfer learning with full-fine tuning from an existing trained model is chosen in order to recognize the entities by considering the limitation of labeled data. Based on observation, there are some English terms in the product texts, so we need model that can be used for multilingual texts. We used multilingual pre-trained model with Transformer Architecture i.e. multilingual-BERT-base-cased and XLM-RoBERTa-base. The extracted entities is mapped into the knowledge graph by adopting T2KG framework components i.e. entity mapping and triple integration. In entity mapping component, the extracted entities having many aliases is being mapped into a unique entity. At this point, the mapped entities should be connected to one another in a relation. Considering there is no expressed relations in the product texts, we use ontology, a part of semantic web, to define the relation between entities with reference from Schema.org. The mapped entities and ontology is being integrated in triple integration component. The goal of information extraction experiment is to achieve a robust model that can be used to unseen product with unseen brand from different e-commerce platform. The train data contains 1.500 labeled product texts, while the test data contains 216 labeled product texts conducted in three versions of test data and four test scenarios. The result showed that XLM-RoBERTa model performed better than multilingual-BERT with average F1-score 0,895. Knowledge graph mapping from the extracted information was evaluated manually with 1.445 product texts from two E-commerce platform, resulting in 338 entities formed in the knowledge graph with mapping precision 0,94.