Enhancing document clustering by integrating semantic background knowledge and syntactic features into the bag of words representation
The basic Bag of Words (BOW) representation generally used in text documents clustering or categorization loses important syntactic and semantic information contained in the documents. When the texts contain a lot of stop words or when they are of a short length this may be particularly problematic....
Saved in:
Main Authors: | , , |
---|---|
Format: | Research Report |
Language: | English |
Published: |
Universiti Malaysia Sabah
2011
|
Subjects: | |
Online Access: | https://eprints.ums.edu.my/id/eprint/22890/1/Enhancing%20document%20clustering%20by%20integrating%20semantic%20background%20knowledge%20and%20syntactic%20features%20into%20the%20bag%20of%20words%20representation.pdf https://eprints.ums.edu.my/id/eprint/22890/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaysia Sabah |
Language: | English |