Knowledge graph embedding with deep learning

Knowledge graphs (KGs) are widely used to represent structured knowledge, such as entities and their relationships, in applications like natural language processing, information retrieval, and recommendation systems. However, real-world domains are complex, leading to incomplete and error-prone KGs....

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Chen
Other Authors: Lam Kwok Yan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173397
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-173397
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
spellingShingle Engineering
Chen, Chen
Knowledge graph embedding with deep learning
description Knowledge graphs (KGs) are widely used to represent structured knowledge, such as entities and their relationships, in applications like natural language processing, information retrieval, and recommendation systems. However, real-world domains are complex, leading to incomplete and error-prone KGs. Knowledge graph completion (KGC) addresses this by predicting missing links and improving KG quality. Knowledge graph embedding (KGE) is a promising approach for KGC, converting KG data into low-dimensional vector representations using deep learning and other techniques. This thesis focuses on deep learning methods for knowledge graph embedding. In the first place, we place our emphasis on the graph-based KGC methods. Existing graph-based methods for KGC generally learn continuous embeddings for entities and relations with shallow linear transformations or deep convolutional modules. These methods suffer from poor expressiveness issues or impose unnecessary image-specific inductive bias to the KGC embedding models, which potentially degrade the model performance. To avoid these issues, we propose a Transformer-based Patch Refinement Model (PatReFormer) under a “Separate-and-Aggregate” framework which segments the input entity and relation embeddings into patches, and utilizes a cross-attentive Transformer architecture for aggregation. Secondly, we start to consider incorporating textual information such as entity / relation description for KGC, and propose a PLM-based method by using an encoder-only structure. The recently-proposed fine-tuned PLMs often overwhelmingly focus on the textual information and overlook structural knowledge. To address this issue, we propose CSProm-KG (Conditional Soft Prompts for KGC) which maintains a balance between structural information and textual knowledge. CSProm-KG only tunes the parameters of Conditional Soft Prompts that are generated by the entities and relations representations and freeze the parameters in PLM. In this way, our proposed approach would be able to consider both information equally and effectively during the KGC process. Thirdly, rather than relying on an encoder-only system to utilize and learn KG textual information, we propose a novel approach based on the sequence-to-sequence paradigm for directly predicting the target entity text. Existing solutions for KGC often cater to specific graph structures, resulting in incompatible methods for different KGC tasks. Such methodological discrepancies not only incur significant maintenance costs but also hinder adaptability to evolving knowledge queries, ingestion processes, and presentation requirements. To address these challenges, we leverage the exceptional performance and technical homogeneity demonstrated by Seq2Seq Pre-trained Language Models (PLMs) across various NLP tasks. We introduce a straightforward yet highly effective Seq2Seq PLM framework, called KG-S2S, that exhibits adaptability to diverse knowledge graph structures. Lastly, we extend the application of KGC techniques to address the challenges in the context of Internet of Things (IoT) services. IoT profiling has recently gained attention as a promising method for validating the normal behavior of connected devices in these services. However, a significant challenge is how to effectively process the vast amounts of IoT profiles to identify suspicious devices which require closer monitoring. To tackle this challenge, we propose a holistic and novel framework HABIT, which regards the behaviors of connected devices as a KG, and detect the “false” knowledge using KGC techniques. By introducing the power of cutting-edge KGC techniques, HABIT offers a comprehensive profiling approach for accurately identifying anomalous behaviors in IoT services.
author2 Lam Kwok Yan
author_facet Lam Kwok Yan
Chen, Chen
format Thesis-Doctor of Philosophy
author Chen, Chen
author_sort Chen, Chen
title Knowledge graph embedding with deep learning
title_short Knowledge graph embedding with deep learning
title_full Knowledge graph embedding with deep learning
title_fullStr Knowledge graph embedding with deep learning
title_full_unstemmed Knowledge graph embedding with deep learning
title_sort knowledge graph embedding with deep learning
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/173397
_version_ 1794549304362270720
spelling sg-ntu-dr.10356-1733972024-03-07T08:52:05Z Knowledge graph embedding with deep learning Chen, Chen Lam Kwok Yan School of Computer Science and Engineering kwokyan.lam@ntu.edu.sg Engineering Knowledge graphs (KGs) are widely used to represent structured knowledge, such as entities and their relationships, in applications like natural language processing, information retrieval, and recommendation systems. However, real-world domains are complex, leading to incomplete and error-prone KGs. Knowledge graph completion (KGC) addresses this by predicting missing links and improving KG quality. Knowledge graph embedding (KGE) is a promising approach for KGC, converting KG data into low-dimensional vector representations using deep learning and other techniques. This thesis focuses on deep learning methods for knowledge graph embedding. In the first place, we place our emphasis on the graph-based KGC methods. Existing graph-based methods for KGC generally learn continuous embeddings for entities and relations with shallow linear transformations or deep convolutional modules. These methods suffer from poor expressiveness issues or impose unnecessary image-specific inductive bias to the KGC embedding models, which potentially degrade the model performance. To avoid these issues, we propose a Transformer-based Patch Refinement Model (PatReFormer) under a “Separate-and-Aggregate” framework which segments the input entity and relation embeddings into patches, and utilizes a cross-attentive Transformer architecture for aggregation. Secondly, we start to consider incorporating textual information such as entity / relation description for KGC, and propose a PLM-based method by using an encoder-only structure. The recently-proposed fine-tuned PLMs often overwhelmingly focus on the textual information and overlook structural knowledge. To address this issue, we propose CSProm-KG (Conditional Soft Prompts for KGC) which maintains a balance between structural information and textual knowledge. CSProm-KG only tunes the parameters of Conditional Soft Prompts that are generated by the entities and relations representations and freeze the parameters in PLM. In this way, our proposed approach would be able to consider both information equally and effectively during the KGC process. Thirdly, rather than relying on an encoder-only system to utilize and learn KG textual information, we propose a novel approach based on the sequence-to-sequence paradigm for directly predicting the target entity text. Existing solutions for KGC often cater to specific graph structures, resulting in incompatible methods for different KGC tasks. Such methodological discrepancies not only incur significant maintenance costs but also hinder adaptability to evolving knowledge queries, ingestion processes, and presentation requirements. To address these challenges, we leverage the exceptional performance and technical homogeneity demonstrated by Seq2Seq Pre-trained Language Models (PLMs) across various NLP tasks. We introduce a straightforward yet highly effective Seq2Seq PLM framework, called KG-S2S, that exhibits adaptability to diverse knowledge graph structures. Lastly, we extend the application of KGC techniques to address the challenges in the context of Internet of Things (IoT) services. IoT profiling has recently gained attention as a promising method for validating the normal behavior of connected devices in these services. However, a significant challenge is how to effectively process the vast amounts of IoT profiles to identify suspicious devices which require closer monitoring. To tackle this challenge, we propose a holistic and novel framework HABIT, which regards the behaviors of connected devices as a KG, and detect the “false” knowledge using KGC techniques. By introducing the power of cutting-edge KGC techniques, HABIT offers a comprehensive profiling approach for accurately identifying anomalous behaviors in IoT services. Doctor of Philosophy 2024-02-02T00:17:00Z 2024-02-02T00:17:00Z 2024 Thesis-Doctor of Philosophy Chen, C. (2024). Knowledge graph embedding with deep learning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173397 https://hdl.handle.net/10356/173397 10.32657/10356/173397 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University