Extracting vocabulary for ontology learning using text mining
Studies on ontologies are receiving a growing attention due to their well-known nature of explicit knowledge representation, sharing common understanding of the structure of information and reusability of domain knowledge. However, manual construction of new ontologies is a time consuming and resour...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/66746 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Studies on ontologies are receiving a growing attention due to their well-known nature of explicit knowledge representation, sharing common understanding of the structure of information and reusability of domain knowledge. However, manual construction of new ontologies is a time consuming and resource costly task. Hence, it rises a focus to develop the ontology learning to automate the construction of new ontologies as well as to maintain the existing ontologies with additional extended knowledge available. The ontology learning which helps enriching existing ontologies comprises processes from the collection of domain-specific literatures, selecting relevant documents and text mining in order to refine the concept vocabularies. Since the World Wide Web is considered as a rich repository of information that can be fed as useful information to the ontology learning, the corpus for this project was built upon the information crawled from the web. Nevertheless, availability of massive amounts of web pages which possesses varied content quality has become an issue in filtering the domain relevant information from the web.
The main objective of this project is to develop a system to retrieve the web pages from the internet and provide an automatic classification process to label them according to their relevance to the domain. In this work, data was collected for the domain “Knowledge Management”.
This project includes the procedures of crawling web data, conducting relevance classification on web textual documents and finally evaluating the results of experiments on selecting different classifiers upon different feature representations which are bag-of-word model based TF-IDF weights and dependency-based word embeddings. |
---|