Semantic characterisation : knowledge discovery for training set
This paper has proposed the use Latent Semantic Indexing (LSI) to extract semantic information to make the best use of the existing knowledge contained in training sets : Semantic Characterisation (SemC). SemC uses LSI to capture the implicit semantic structure in documents by directly applying cate...
Saved in:
Main Authors: | , , |
---|---|
Format: | E-Article |
Language: | English |
Published: |
International Journal of Innovation, Management and Technology
2013
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/47/1/Semantic%20Characterisation%20%28abstract%29.pdf http://ir.unimas.my/id/eprint/47/ http://ir.unimas.my/47/1/Semantic%20Characterisation%20%28abstract%29.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaysia Sarawak |
Language: | English |
Summary: | This paper has proposed the use Latent Semantic Indexing (LSI) to extract semantic information to make the best use of the existing knowledge contained in training sets : Semantic Characterisation (SemC). SemC uses LSI to capture the implicit semantic structure in documents by directly applying category labels imposed by experts to make semantic structure explicit. The training set filtered by SemC is tested on a supervised automated text categorisation system using Support Vector Machine as classifier. Category by category analysis has shown the ability to bring out the semantic characteristics of the datasets. Even with a reduced training set, SemC is able to overcome the generalisation problem due to its ability to reduce noise within individual categories. Our empirical results also demonstrated that SemC managed to improve categorisation results of heavily overlapping categories. Empirical results also showed that SemC is applicable to a various supervised classifiers. |
---|