Hierarchical text classification and evaluation

Hierarchical Classification refers to assigning of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose atop-down lev...

Full description

Saved in:
Bibliographic Details
Main Authors: SUN, Aixin, LIM, Ee Peng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2001
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/976
https://ink.library.smu.edu.sg/context/sis_research/article/1975/viewcontent/e229a27ac45fe58b3c1b08fa64e84bd79b56.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1975
record_format dspace
spelling sg-smu-ink.sis_research-19752018-06-21T08:05:45Z Hierarchical text classification and evaluation SUN, Aixin LIM, Ee Peng Hierarchical Classification refers to assigning of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose atop-down level-based classification method that can classify documents to both leaf and internal categories. As the standard performance measures assume independence between categories, they have not considered the documents incorrectly classified into categories that are similar or not far from the correct ones in the category tree. We therefore propose the Category-Similarity Measures and Distance-Based Measures to consider the degree of misclassification in measuring the classification performance. An experiment has been carried out to measure the performance four proposed hierarchical classification method. The results showed that our method performs well for Reuters text collection when enough training documents are given andthe new measures have indeed considered the contributions of misclassified documents. 2001-11-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/976 info:doi/10.1109/ICDM.2001.989560 https://ink.library.smu.edu.sg/context/sis_research/article/1975/viewcontent/e229a27ac45fe58b3c1b08fa64e84bd79b56.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
SUN, Aixin
LIM, Ee Peng
Hierarchical text classification and evaluation
description Hierarchical Classification refers to assigning of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose atop-down level-based classification method that can classify documents to both leaf and internal categories. As the standard performance measures assume independence between categories, they have not considered the documents incorrectly classified into categories that are similar or not far from the correct ones in the category tree. We therefore propose the Category-Similarity Measures and Distance-Based Measures to consider the degree of misclassification in measuring the classification performance. An experiment has been carried out to measure the performance four proposed hierarchical classification method. The results showed that our method performs well for Reuters text collection when enough training documents are given andthe new measures have indeed considered the contributions of misclassified documents.
format text
author SUN, Aixin
LIM, Ee Peng
author_facet SUN, Aixin
LIM, Ee Peng
author_sort SUN, Aixin
title Hierarchical text classification and evaluation
title_short Hierarchical text classification and evaluation
title_full Hierarchical text classification and evaluation
title_fullStr Hierarchical text classification and evaluation
title_full_unstemmed Hierarchical text classification and evaluation
title_sort hierarchical text classification and evaluation
publisher Institutional Knowledge at Singapore Management University
publishDate 2001
url https://ink.library.smu.edu.sg/sis_research/976
https://ink.library.smu.edu.sg/context/sis_research/article/1975/viewcontent/e229a27ac45fe58b3c1b08fa64e84bd79b56.pdf
_version_ 1770570811515600896