Machine learning techniques for knowledge extraction from text

With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by huma...

Full description

Saved in:

Bibliographic Details
Main Author:	Wang, Zhaochun
Other Authors:	Mao Kezhi
Format:	Theses and Dissertations
Language:	English
Published:	2016
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	http://hdl.handle.net/10356/65902
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-65902
record_format	dspace
spelling	sg-ntu-dr.10356-659022023-07-04T15:40:46Z Machine learning techniques for knowledge extraction from text Wang, Zhaochun Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by humans will be very time-consuming and impossible. Therefore, it is very meaningful to conducted research on automatic document summarization (ADS). This paper proposes two automatic document summarization methods which based on latent semantic analysis (LSA) and nonnegative matrix factorization (NMF) algorithms to select some sentences or words which retain the main point of original documents to form a brief summary. Both methods are aimed at to learn semantic features for each sentence and select the important sentences based on the learned representation. In details, some programs assists users to decompose each sentence into a collection of semantic features and each semantic feature can be regarded as a high-level feature composed of the whole vocabulary. The selection of sentences is based on clustering method which can find the latent structure on the sentence level. In addition, we performed our methods on DUC 2001, which is a public and widely-used document summarization datasets. The experimental conclusions demonstrate that LSA and NMF methods are able to achieve a high accuracy and precision. Besides that, the difference between LSA and NMF has been compared and the parameters’ sensitivity in these methods, including the reduced dimension and the length of the input summary, has been analyzed. Keywords Automatic document summarization, Latent semantic analysis, Nonnegative matrix factorization, Semantic features, Document Understanding Conference Master of Science (Computer Control and Automation) 2016-01-13T04:46:31Z 2016-01-13T04:46:31Z 2016 Thesis http://hdl.handle.net/10356/65902 en 75 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Wang, Zhaochun Machine learning techniques for knowledge extraction from text
description	With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by humans will be very time-consuming and impossible. Therefore, it is very meaningful to conducted research on automatic document summarization (ADS). This paper proposes two automatic document summarization methods which based on latent semantic analysis (LSA) and nonnegative matrix factorization (NMF) algorithms to select some sentences or words which retain the main point of original documents to form a brief summary. Both methods are aimed at to learn semantic features for each sentence and select the important sentences based on the learned representation. In details, some programs assists users to decompose each sentence into a collection of semantic features and each semantic feature can be regarded as a high-level feature composed of the whole vocabulary. The selection of sentences is based on clustering method which can find the latent structure on the sentence level. In addition, we performed our methods on DUC 2001, which is a public and widely-used document summarization datasets. The experimental conclusions demonstrate that LSA and NMF methods are able to achieve a high accuracy and precision. Besides that, the difference between LSA and NMF has been compared and the parameters’ sensitivity in these methods, including the reduced dimension and the length of the input summary, has been analyzed. Keywords Automatic document summarization, Latent semantic analysis, Nonnegative matrix factorization, Semantic features, Document Understanding Conference
author2	Mao Kezhi
author_facet	Mao Kezhi Wang, Zhaochun
format	Theses and Dissertations
author	Wang, Zhaochun
author_sort	Wang, Zhaochun
title	Machine learning techniques for knowledge extraction from text
title_short	Machine learning techniques for knowledge extraction from text
title_full	Machine learning techniques for knowledge extraction from text
title_fullStr	Machine learning techniques for knowledge extraction from text
title_full_unstemmed	Machine learning techniques for knowledge extraction from text
title_sort	machine learning techniques for knowledge extraction from text
publishDate	2016
url	http://hdl.handle.net/10356/65902
_version_	1772828423685668864

Machine learning techniques for knowledge extraction from text

Similar Items