Performance measurement framework for hierarchical text classification

Hierarchical text classification or simply hierarchical classification refers to assigning a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to...

Full description

Saved in:
Bibliographic Details
Main Authors: LIM, Ee Peng, SUN, Aixin, NG, Wee-Keong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2003
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/166
https://ink.library.smu.edu.sg/context/sis_research/article/1165/viewcontent/92168ec09813f72699295ba60cbf9299b637.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-1165
record_format dspace
spelling sg-smu-ink.sis_research-11652018-06-25T04:03:23Z Performance measurement framework for hierarchical text classification LIM, Ee Peng SUN, Aixin NG, Wee-Keong Hierarchical text classification or simply hierarchical classification refers to assigning a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to evaluate performance. These performance measures often assume independence between categories and do not consider documents misclassified into categories that are similar or not far from the correct categories in the category tree. In this paper, we therefore propose new performance measures for hierarchical classification. The proposed performance measures consist of category similarity measures and distance-based measures that consider the contributions of misclassified documents. Our experiments on hierarchical classification methods based on SVM classifiers and binary Naive Bayes classifiers showed that SVM classifiers perform better than Naïve Bayes classifiers on Reuters-21578 collection according to the extended measures. A new classifier-centric measure called blocking measure is also defined to examine the performance of subtree classifiers in a top-down level-based hierarchical classification method. 2003-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/166 info:doi/10.1002/asi.10298 https://ink.library.smu.edu.sg/context/sis_research/article/1165/viewcontent/92168ec09813f72699295ba60cbf9299b637.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
LIM, Ee Peng
SUN, Aixin
NG, Wee-Keong
Performance measurement framework for hierarchical text classification
description Hierarchical text classification or simply hierarchical classification refers to assigning a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to evaluate performance. These performance measures often assume independence between categories and do not consider documents misclassified into categories that are similar or not far from the correct categories in the category tree. In this paper, we therefore propose new performance measures for hierarchical classification. The proposed performance measures consist of category similarity measures and distance-based measures that consider the contributions of misclassified documents. Our experiments on hierarchical classification methods based on SVM classifiers and binary Naive Bayes classifiers showed that SVM classifiers perform better than Naïve Bayes classifiers on Reuters-21578 collection according to the extended measures. A new classifier-centric measure called blocking measure is also defined to examine the performance of subtree classifiers in a top-down level-based hierarchical classification method.
format text
author LIM, Ee Peng
SUN, Aixin
NG, Wee-Keong
author_facet LIM, Ee Peng
SUN, Aixin
NG, Wee-Keong
author_sort LIM, Ee Peng
title Performance measurement framework for hierarchical text classification
title_short Performance measurement framework for hierarchical text classification
title_full Performance measurement framework for hierarchical text classification
title_fullStr Performance measurement framework for hierarchical text classification
title_full_unstemmed Performance measurement framework for hierarchical text classification
title_sort performance measurement framework for hierarchical text classification
publisher Institutional Knowledge at Singapore Management University
publishDate 2003
url https://ink.library.smu.edu.sg/sis_research/166
https://ink.library.smu.edu.sg/context/sis_research/article/1165/viewcontent/92168ec09813f72699295ba60cbf9299b637.pdf
_version_ 1770568908445581312