Topic identification method for textual document
Abstract— Topic identification is a crucial task for discovering knowledge from textual document. Existing methods for topic identification suffer from word counting problem as they depend on the most frequent terms in the text to produce the topic keyword.Not all frequent terms are relevant. T...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
JMEST
2017
|
Subjects: | |
Online Access: | http://repo.uum.edu.my/21719/1/JMEST%204%202%202017%206643%206647.pdf http://repo.uum.edu.my/21719/ http://www.jmest.org/wp-content/uploads/JMESTN42352037.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Utara Malaysia |
Language: | English |
id |
my.uum.repo.21719 |
---|---|
record_format |
eprints |
spelling |
my.uum.repo.217192017-04-19T07:44:55Z http://repo.uum.edu.my/21719/ Topic identification method for textual document Jamil, Nurul Syafidah Ku-Mahamud, Ku Ruhana Mohamed Din, Aniza QA76 Computer software Abstract— Topic identification is a crucial task for discovering knowledge from textual document. Existing methods for topic identification suffer from word counting problem as they depend on the most frequent terms in the text to produce the topic keyword.Not all frequent terms are relevant. This paper proposes a topic identification method that filters the important terms from the preprocessed text and applied term weighting scheme to solve synonym problem.A rule generation algorithm is used to determine the appropriate topics based on the weighted terms.The text document used in the experiment is the English translated Quran.The topics identified from the proposed method were compared with topics identified using Rough Set and domain experts. From the findings, the proposed topic identification method was consistently able to identify topics that are mostly close to the topics that have been given by Rough Set and the experts.The result from the comparison proved that the proposed method was able to be used to capture topics for textual documents. JMEST 2017 Article PeerReviewed application/pdf en http://repo.uum.edu.my/21719/1/JMEST%204%202%202017%206643%206647.pdf Jamil, Nurul Syafidah and Ku-Mahamud, Ku Ruhana and Mohamed Din, Aniza (2017) Topic identification method for textual document. Journal of Multidisciplinary Engineering Science and Technology (JMEST), 4 (2). pp. 6643-6647. ISSN 2458-9403 http://www.jmest.org/wp-content/uploads/JMESTN42352037.pdf |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Institutionali Repository |
url_provider |
http://repo.uum.edu.my/ |
language |
English |
topic |
QA76 Computer software |
spellingShingle |
QA76 Computer software Jamil, Nurul Syafidah Ku-Mahamud, Ku Ruhana Mohamed Din, Aniza Topic identification method for textual document |
description |
Abstract— Topic identification is a crucial task
for discovering knowledge from textual document.
Existing methods for topic identification suffer
from word counting problem as they depend on the most frequent terms in the text to produce the
topic keyword.Not all frequent terms are relevant.
This paper proposes a topic identification method
that filters the important terms from the preprocessed text and applied term weighting
scheme to solve synonym problem.A rule generation algorithm is used to determine the appropriate topics based on the weighted terms.The text document used in the experiment is the English translated Quran.The topics identified from the proposed method were compared with topics identified using Rough Set and domain experts. From the findings, the proposed topic identification method was consistently able to
identify topics that are mostly close to the topics that have been given by Rough Set and the
experts.The result from the comparison proved
that the proposed method was able to be used to
capture topics for textual documents. |
format |
Article |
author |
Jamil, Nurul Syafidah Ku-Mahamud, Ku Ruhana Mohamed Din, Aniza |
author_facet |
Jamil, Nurul Syafidah Ku-Mahamud, Ku Ruhana Mohamed Din, Aniza |
author_sort |
Jamil, Nurul Syafidah |
title |
Topic identification method for textual document |
title_short |
Topic identification method for textual document |
title_full |
Topic identification method for textual document |
title_fullStr |
Topic identification method for textual document |
title_full_unstemmed |
Topic identification method for textual document |
title_sort |
topic identification method for textual document |
publisher |
JMEST |
publishDate |
2017 |
url |
http://repo.uum.edu.my/21719/1/JMEST%204%202%202017%206643%206647.pdf http://repo.uum.edu.my/21719/ http://www.jmest.org/wp-content/uploads/JMESTN42352037.pdf |
_version_ |
1644283316802682880 |