KNOWLEDGE GRAPH CONSTRUCTION USING INFORMATION EXTRACTION OF INDONESIA COSMETIC PRODUCT TEXT IN BAHASA INDONESIA
Knowledge graph is one of the semantic web technologies that can be used for entity recognition in text, graph visualization, and even to improve business process e.g. information retrieval for product searching in E-commerce. One of information sources for building knowledge graph is text data t...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/50248 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Knowledge graph is one of the semantic web technologies that can be used for entity
recognition in text, graph visualization, and even to improve business process e.g. information
retrieval for product searching in E-commerce. One of information sources for building
knowledge graph is text data that is available in many digital domain such as E-commerce
platform. For now, E-commerce sites don’t have any structured products information available
in the platform. Therefore, E-commerce is dependent to the seller’s information. On the other
hand, the usage of structured information, that can be represented in knowledge graph, may
improve the business process in E-commerce. There is a need to do a research about knowledge
graph construction with the source of product text available in E-commerce sites written in
Bahasa Indonesia.
The structured entity in knowledge graph is extracted from the product text with the use of NLP
tools. Technique by using transfer learning with full-fine tuning from an existing trained model
is chosen in order to recognize the entities by considering the limitation of labeled data. Based
on observation, there are some English terms in the product texts, so we need model that can
be used for multilingual texts. We used multilingual pre-trained model with Transformer
Architecture i.e. multilingual-BERT-base-cased and XLM-RoBERTa-base. The extracted
entities is mapped into the knowledge graph by adopting T2KG framework components i.e.
entity mapping and triple integration. In entity mapping component, the extracted entities
having many aliases is being mapped into a unique entity. At this point, the mapped entities
should be connected to one another in a relation. Considering there is no expressed relations
in the product texts, we use ontology, a part of semantic web, to define the relation between
entities with reference from Schema.org. The mapped entities and ontology is being integrated
in triple integration component.
The goal of information extraction experiment is to achieve a robust model that can be used to
unseen product with unseen brand from different e-commerce platform. The train data contains
1.500 labeled product texts, while the test data contains 216 labeled product texts conducted
in three versions of test data and four test scenarios. The result showed that XLM-RoBERTa
model performed better than multilingual-BERT with average F1-score 0,895. Knowledge
graph mapping from the extracted information was evaluated manually with 1.445 product
texts from two E-commerce platform, resulting in 338 entities formed in the knowledge graph
with mapping precision 0,94. |
---|