Machine learning techniques for knowledge extraction from text

With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by huma...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Zhaochun
Other Authors: Mao Kezhi
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/65902
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-65902
record_format dspace
spelling sg-ntu-dr.10356-659022023-07-04T15:40:46Z Machine learning techniques for knowledge extraction from text Wang, Zhaochun Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by humans will be very time-consuming and impossible. Therefore, it is very meaningful to conducted research on automatic document summarization (ADS). This paper proposes two automatic document summarization methods which based on latent semantic analysis (LSA) and nonnegative matrix factorization (NMF) algorithms to select some sentences or words which retain the main point of original documents to form a brief summary. Both methods are aimed at to learn semantic features for each sentence and select the important sentences based on the learned representation. In details, some programs assists users to decompose each sentence into a collection of semantic features and each semantic feature can be regarded as a high-level feature composed of the whole vocabulary. The selection of sentences is based on clustering method which can find the latent structure on the sentence level. In addition, we performed our methods on DUC 2001, which is a public and widely-used document summarization datasets. The experimental conclusions demonstrate that LSA and NMF methods are able to achieve a high accuracy and precision. Besides that, the difference between LSA and NMF has been compared and the parameters’ sensitivity in these methods, including the reduced dimension and the length of the input summary, has been analyzed. Keywords Automatic document summarization, Latent semantic analysis, Nonnegative matrix factorization, Semantic features, Document Understanding Conference Master of Science (Computer Control and Automation) 2016-01-13T04:46:31Z 2016-01-13T04:46:31Z 2016 Thesis http://hdl.handle.net/10356/65902 en 75 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Wang, Zhaochun
Machine learning techniques for knowledge extraction from text
description With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by humans will be very time-consuming and impossible. Therefore, it is very meaningful to conducted research on automatic document summarization (ADS). This paper proposes two automatic document summarization methods which based on latent semantic analysis (LSA) and nonnegative matrix factorization (NMF) algorithms to select some sentences or words which retain the main point of original documents to form a brief summary. Both methods are aimed at to learn semantic features for each sentence and select the important sentences based on the learned representation. In details, some programs assists users to decompose each sentence into a collection of semantic features and each semantic feature can be regarded as a high-level feature composed of the whole vocabulary. The selection of sentences is based on clustering method which can find the latent structure on the sentence level. In addition, we performed our methods on DUC 2001, which is a public and widely-used document summarization datasets. The experimental conclusions demonstrate that LSA and NMF methods are able to achieve a high accuracy and precision. Besides that, the difference between LSA and NMF has been compared and the parameters’ sensitivity in these methods, including the reduced dimension and the length of the input summary, has been analyzed. Keywords Automatic document summarization, Latent semantic analysis, Nonnegative matrix factorization, Semantic features, Document Understanding Conference
author2 Mao Kezhi
author_facet Mao Kezhi
Wang, Zhaochun
format Theses and Dissertations
author Wang, Zhaochun
author_sort Wang, Zhaochun
title Machine learning techniques for knowledge extraction from text
title_short Machine learning techniques for knowledge extraction from text
title_full Machine learning techniques for knowledge extraction from text
title_fullStr Machine learning techniques for knowledge extraction from text
title_full_unstemmed Machine learning techniques for knowledge extraction from text
title_sort machine learning techniques for knowledge extraction from text
publishDate 2016
url http://hdl.handle.net/10356/65902
_version_ 1772828423685668864