Constructing knowledge graph from linux kernel commit message

With a large amount of data available, a lot of security-related information can be extracted from the data. The main problem is a large portion of them (80%-90%) are stored in an unstructured manner. One of the well-known forms of unstructured data is in the form of text. Textual data can contain m...

Full description

Saved in:
Bibliographic Details
Main Author: Pentium, Gede Bagus Bayu
Other Authors: Liu Yang
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/74057
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-74057
record_format dspace
spelling sg-ntu-dr.10356-740572023-03-03T20:56:29Z Constructing knowledge graph from linux kernel commit message Pentium, Gede Bagus Bayu Liu Yang School of Computer Science and Engineering Chen Chunyang DRNTU::Engineering::Computer science and engineering With a large amount of data available, a lot of security-related information can be extracted from the data. The main problem is a large portion of them (80%-90%) are stored in an unstructured manner. One of the well-known forms of unstructured data is in the form of text. Textual data can contain much information with using a small amount of space. But textual data are mainly stored in human language, with this machine are having a hard time to extract information. Many natural language processing is done to extract information from the text. When extracting information data representation is playing a huge role. One of the most popular data representation from textual data is knowledge graph. Constructing knowledge graph from unstructured textual data can help the machine to understand the information contained in the data. This project is aimed to extract knowledge graph from Linux Kernel commit message. With consists of more than 700,000 commit message, this is a huge amount of data to be processed. If the information is successfully extracted, the information contained will benefit a lot in computer security. The knowledge graph extraction consists of four processes. They are data cleaning, entity extraction, relation extraction, and knowledge graph construction. Entity extraction is a process to recognize named entities from the text into pre-defined categories. For entity extraction, a combination of automated labeling and machine learning (CRF classifier) are used. Relation extraction is a process to detect and classify semantic relationship between the pre-extracted entities in text. For relation extraction, both schema-based and schema-free relation is extracted. After the extraction, 1,247,864 entities and 1,747,009 relations are extracted. With a convincing result of 74.29% F-measure score, the knowledge extraction is considered to be performing well under given circumstances. Bachelor of Engineering (Computer Science) 2018-04-24T04:32:08Z 2018-04-24T04:32:08Z 2018 Final Year Project (FYP) http://hdl.handle.net/10356/74057 en Nanyang Technological University 62 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Pentium, Gede Bagus Bayu
Constructing knowledge graph from linux kernel commit message
description With a large amount of data available, a lot of security-related information can be extracted from the data. The main problem is a large portion of them (80%-90%) are stored in an unstructured manner. One of the well-known forms of unstructured data is in the form of text. Textual data can contain much information with using a small amount of space. But textual data are mainly stored in human language, with this machine are having a hard time to extract information. Many natural language processing is done to extract information from the text. When extracting information data representation is playing a huge role. One of the most popular data representation from textual data is knowledge graph. Constructing knowledge graph from unstructured textual data can help the machine to understand the information contained in the data. This project is aimed to extract knowledge graph from Linux Kernel commit message. With consists of more than 700,000 commit message, this is a huge amount of data to be processed. If the information is successfully extracted, the information contained will benefit a lot in computer security. The knowledge graph extraction consists of four processes. They are data cleaning, entity extraction, relation extraction, and knowledge graph construction. Entity extraction is a process to recognize named entities from the text into pre-defined categories. For entity extraction, a combination of automated labeling and machine learning (CRF classifier) are used. Relation extraction is a process to detect and classify semantic relationship between the pre-extracted entities in text. For relation extraction, both schema-based and schema-free relation is extracted. After the extraction, 1,247,864 entities and 1,747,009 relations are extracted. With a convincing result of 74.29% F-measure score, the knowledge extraction is considered to be performing well under given circumstances.
author2 Liu Yang
author_facet Liu Yang
Pentium, Gede Bagus Bayu
format Final Year Project
author Pentium, Gede Bagus Bayu
author_sort Pentium, Gede Bagus Bayu
title Constructing knowledge graph from linux kernel commit message
title_short Constructing knowledge graph from linux kernel commit message
title_full Constructing knowledge graph from linux kernel commit message
title_fullStr Constructing knowledge graph from linux kernel commit message
title_full_unstemmed Constructing knowledge graph from linux kernel commit message
title_sort constructing knowledge graph from linux kernel commit message
publishDate 2018
url http://hdl.handle.net/10356/74057
_version_ 1759854520526962688