An evaluation of classification models for question topic categorization

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, thi...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: Qu, Bo, Cong, Gao, Li, Cuiping, Sun, Aixin, Chen, Hong
مؤلفون آخرون: School of Computer Engineering
التنسيق: مقال
اللغة:English
منشور في: 2013
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/99292
http://hdl.handle.net/10220/17203
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
id sg-ntu-dr.10356-99292
record_format dspace
spelling sg-ntu-dr.10356-992922020-05-28T07:17:39Z An evaluation of classification models for question topic categorization Qu, Bo Cong, Gao Li, Cuiping Sun, Aixin Chen, Hong School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Information systems We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems. 2013-11-01T02:27:28Z 2019-12-06T20:05:27Z 2013-11-01T02:27:28Z 2019-12-06T20:05:27Z 2012 2012 Journal Article Qu, B., Cong, G., Li, C., Sun, A., & Chen, H. (2012). An evaluation of classification models for question topic categorization. Journal of the American society for information science and technology, 63(5), 889-903. 1532-2882 https://hdl.handle.net/10356/99292 http://hdl.handle.net/10220/17203 10.1002/asi.22611 en Journal of the American society for information science and technology
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems
Qu, Bo
Cong, Gao
Li, Cuiping
Sun, Aixin
Chen, Hong
An evaluation of classification models for question topic categorization
description We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
author2 School of Computer Engineering
author_facet School of Computer Engineering
Qu, Bo
Cong, Gao
Li, Cuiping
Sun, Aixin
Chen, Hong
format Article
author Qu, Bo
Cong, Gao
Li, Cuiping
Sun, Aixin
Chen, Hong
author_sort Qu, Bo
title An evaluation of classification models for question topic categorization
title_short An evaluation of classification models for question topic categorization
title_full An evaluation of classification models for question topic categorization
title_fullStr An evaluation of classification models for question topic categorization
title_full_unstemmed An evaluation of classification models for question topic categorization
title_sort evaluation of classification models for question topic categorization
publishDate 2013
url https://hdl.handle.net/10356/99292
http://hdl.handle.net/10220/17203
_version_ 1681056555162664960