Machine learning techniques for knowledge extraction from text
With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by huma...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/65902 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-65902 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-659022023-07-04T15:40:46Z Machine learning techniques for knowledge extraction from text Wang, Zhaochun Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by humans will be very time-consuming and impossible. Therefore, it is very meaningful to conducted research on automatic document summarization (ADS). This paper proposes two automatic document summarization methods which based on latent semantic analysis (LSA) and nonnegative matrix factorization (NMF) algorithms to select some sentences or words which retain the main point of original documents to form a brief summary. Both methods are aimed at to learn semantic features for each sentence and select the important sentences based on the learned representation. In details, some programs assists users to decompose each sentence into a collection of semantic features and each semantic feature can be regarded as a high-level feature composed of the whole vocabulary. The selection of sentences is based on clustering method which can find the latent structure on the sentence level. In addition, we performed our methods on DUC 2001, which is a public and widely-used document summarization datasets. The experimental conclusions demonstrate that LSA and NMF methods are able to achieve a high accuracy and precision. Besides that, the difference between LSA and NMF has been compared and the parameters’ sensitivity in these methods, including the reduced dimension and the length of the input summary, has been analyzed. Keywords Automatic document summarization, Latent semantic analysis, Nonnegative matrix factorization, Semantic features, Document Understanding Conference Master of Science (Computer Control and Automation) 2016-01-13T04:46:31Z 2016-01-13T04:46:31Z 2016 Thesis http://hdl.handle.net/10356/65902 en 75 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Wang, Zhaochun Machine learning techniques for knowledge extraction from text |
description |
With the development of machine learning techniques, it opens up more opportunities for users to simulate a person’s attitude and evaluation towards a text by computers. And considering the increasing amount of online information, text summarization for the huge amount of documents conducted by humans will be very time-consuming and impossible. Therefore, it is very meaningful to conducted research on automatic document summarization (ADS). This paper proposes two automatic document summarization methods which based on latent semantic analysis (LSA) and nonnegative matrix factorization (NMF) algorithms to select some sentences or words which retain the main point of original documents to form a brief summary. Both methods are aimed at to learn semantic features for each sentence and select the important sentences based on the learned representation. In details, some programs assists users to decompose each sentence into a collection of semantic features and each semantic feature can be regarded as a high-level feature composed of the whole vocabulary. The selection of sentences is based on clustering method which can find the latent structure on the sentence level. In addition, we performed our methods on DUC 2001, which is a public and widely-used document summarization datasets. The experimental conclusions demonstrate that LSA and NMF methods are able to achieve a high accuracy and precision. Besides that, the difference between LSA and NMF has been compared and the parameters’ sensitivity in these methods, including the reduced dimension and the length of the input summary, has been analyzed.
Keywords
Automatic document summarization, Latent semantic analysis, Nonnegative matrix factorization, Semantic features, Document Understanding Conference |
author2 |
Mao Kezhi |
author_facet |
Mao Kezhi Wang, Zhaochun |
format |
Theses and Dissertations |
author |
Wang, Zhaochun |
author_sort |
Wang, Zhaochun |
title |
Machine learning techniques for knowledge extraction from text |
title_short |
Machine learning techniques for knowledge extraction from text |
title_full |
Machine learning techniques for knowledge extraction from text |
title_fullStr |
Machine learning techniques for knowledge extraction from text |
title_full_unstemmed |
Machine learning techniques for knowledge extraction from text |
title_sort |
machine learning techniques for knowledge extraction from text |
publishDate |
2016 |
url |
http://hdl.handle.net/10356/65902 |
_version_ |
1772828423685668864 |