Language identifications of Arabic script web documents using independent component analysis
We analyze the language identification algorithms used to identify the Arabic script web documents such as Arabic, Jawi, Persian and Urdu using independent component analysis (ICA). We have used a combination of Entropy term weighting scheme and class based feature (CPBF) vectors as feature selectio...
Saved in:
Main Authors: | , |
---|---|
Format: | Book Section |
Published: |
Institute of Electrical and Electronics Engineers
2008
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/12612/ http://dx.doi.org/10.1109/AMS.2008.46 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |