A Novel Term Weighting Scheme for Imbalanced Text Classification

High dimensional feature is the main problem of text domain. If imbalance class is also found in the context, the classifier’s performance is worsen. Moreover, solving imbalance problem by oversampling method in this circumstance is very difficult to get performance improvement. In this paper, a new...

Full description

Saved in:
Bibliographic Details
Main Author: Tantisripreecha T.
Other Authors: Mahidol University
Format: Article
Published: 2023
Subjects:
Online Access:https://repository.li.mahidol.ac.th/handle/123456789/84271
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Mahidol University
id th-mahidol.84271
record_format dspace
spelling th-mahidol.842712023-06-19T00:01:50Z A Novel Term Weighting Scheme for Imbalanced Text Classification Tantisripreecha T. Mahidol University Computer Science High dimensional feature is the main problem of text domain. If imbalance class is also found in the context, the classifier’s performance is worsen. Moreover, solving imbalance problem by oversampling method in this circumstance is very difficult to get performance improvement. In this paper, a new term weighting scheme is proposed by combining Term frequency with an average of inverse document frequency factor. We denoted our scheme by TFmeanIDF. Our proposed method has high potential for imbalance text domain with high dimension. No feature selection or oversampling method is required. Extensive comparison results on 7 datasets validate the advantages of TFmeanIDF in terms of F1 score obtained from widely used base classifiers, such as Logistic regression and Support Vector Machines. We found that F1 score of minority class is higher than that of baseline term weighting schemes. Using TFmeanIDF as a term weighting shows promising result for logistics regression and support vector machines. 2023-06-18T17:01:49Z 2023-06-18T17:01:49Z 2022-06-01 Article Informatica (Slovenia) Vol.46 No.2 (2022) , 259-268 10.31449/inf.v46i2.3523 18543871 03505596 2-s2.0-85135636469 https://repository.li.mahidol.ac.th/handle/123456789/84271 SCOPUS
institution Mahidol University
building Mahidol University Library
continent Asia
country Thailand
Thailand
content_provider Mahidol University Library
collection Mahidol University Institutional Repository
topic Computer Science
spellingShingle Computer Science
Tantisripreecha T.
A Novel Term Weighting Scheme for Imbalanced Text Classification
description High dimensional feature is the main problem of text domain. If imbalance class is also found in the context, the classifier’s performance is worsen. Moreover, solving imbalance problem by oversampling method in this circumstance is very difficult to get performance improvement. In this paper, a new term weighting scheme is proposed by combining Term frequency with an average of inverse document frequency factor. We denoted our scheme by TFmeanIDF. Our proposed method has high potential for imbalance text domain with high dimension. No feature selection or oversampling method is required. Extensive comparison results on 7 datasets validate the advantages of TFmeanIDF in terms of F1 score obtained from widely used base classifiers, such as Logistic regression and Support Vector Machines. We found that F1 score of minority class is higher than that of baseline term weighting schemes. Using TFmeanIDF as a term weighting shows promising result for logistics regression and support vector machines.
author2 Mahidol University
author_facet Mahidol University
Tantisripreecha T.
format Article
author Tantisripreecha T.
author_sort Tantisripreecha T.
title A Novel Term Weighting Scheme for Imbalanced Text Classification
title_short A Novel Term Weighting Scheme for Imbalanced Text Classification
title_full A Novel Term Weighting Scheme for Imbalanced Text Classification
title_fullStr A Novel Term Weighting Scheme for Imbalanced Text Classification
title_full_unstemmed A Novel Term Weighting Scheme for Imbalanced Text Classification
title_sort novel term weighting scheme for imbalanced text classification
publishDate 2023
url https://repository.li.mahidol.ac.th/handle/123456789/84271
_version_ 1781413926312869888