Performance measurement framework for hierarchical text classification
Hierarchical text classification or simply hierarchical classification refers to assigning a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2003
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/166 https://ink.library.smu.edu.sg/context/sis_research/article/1165/viewcontent/92168ec09813f72699295ba60cbf9299b637.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-1165 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-11652018-06-25T04:03:23Z Performance measurement framework for hierarchical text classification LIM, Ee Peng SUN, Aixin NG, Wee-Keong Hierarchical text classification or simply hierarchical classification refers to assigning a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to evaluate performance. These performance measures often assume independence between categories and do not consider documents misclassified into categories that are similar or not far from the correct categories in the category tree. In this paper, we therefore propose new performance measures for hierarchical classification. The proposed performance measures consist of category similarity measures and distance-based measures that consider the contributions of misclassified documents. Our experiments on hierarchical classification methods based on SVM classifiers and binary Naive Bayes classifiers showed that SVM classifiers perform better than Naïve Bayes classifiers on Reuters-21578 collection according to the extended measures. A new classifier-centric measure called blocking measure is also defined to examine the performance of subtree classifiers in a top-down level-based hierarchical classification method. 2003-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/166 info:doi/10.1002/asi.10298 https://ink.library.smu.edu.sg/context/sis_research/article/1165/viewcontent/92168ec09813f72699295ba60cbf9299b637.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems Numerical Analysis and Scientific Computing |
spellingShingle |
Databases and Information Systems Numerical Analysis and Scientific Computing LIM, Ee Peng SUN, Aixin NG, Wee-Keong Performance measurement framework for hierarchical text classification |
description |
Hierarchical text classification or simply hierarchical classification refers to assigning a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to evaluate performance. These performance measures often assume independence between categories and do not consider documents misclassified into categories that are similar or not far from the correct categories in the category tree. In this paper, we therefore propose new performance measures for hierarchical classification. The proposed performance measures consist of category similarity measures and distance-based measures that consider the contributions of misclassified documents. Our experiments on hierarchical classification methods based on SVM classifiers and binary Naive Bayes classifiers showed that SVM classifiers perform better than Naïve Bayes classifiers on Reuters-21578 collection according to the extended measures. A new classifier-centric measure called blocking measure is also defined to examine the performance of subtree classifiers in a top-down level-based hierarchical classification method. |
format |
text |
author |
LIM, Ee Peng SUN, Aixin NG, Wee-Keong |
author_facet |
LIM, Ee Peng SUN, Aixin NG, Wee-Keong |
author_sort |
LIM, Ee Peng |
title |
Performance measurement framework for hierarchical text classification |
title_short |
Performance measurement framework for hierarchical text classification |
title_full |
Performance measurement framework for hierarchical text classification |
title_fullStr |
Performance measurement framework for hierarchical text classification |
title_full_unstemmed |
Performance measurement framework for hierarchical text classification |
title_sort |
performance measurement framework for hierarchical text classification |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2003 |
url |
https://ink.library.smu.edu.sg/sis_research/166 https://ink.library.smu.edu.sg/context/sis_research/article/1165/viewcontent/92168ec09813f72699295ba60cbf9299b637.pdf |
_version_ |
1770568908445581312 |