Finding English and translated Arabic documents similarities using GHSOM

The idea of finding similar news across Arabic and English sources is that to provide the audience with multiple views of the broadcasted news because reading the news from a single source may not always reflects on what happening around the world due different background, cultures and opinions of t...

Full description

Saved in:
Bibliographic Details
Main Authors: Selamat, Ali, Ismail, Hanadi Hassen
Format: Book Section
Published: Institute of Electrical and Electronics Engineers 2008
Subjects:
Online Access:http://eprints.utm.my/id/eprint/12570/
http://dx.doi.org/10.1109/ICCCE.2008.4580647
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
id my.utm.12570
record_format eprints
spelling my.utm.125702017-10-02T08:51:25Z http://eprints.utm.my/id/eprint/12570/ Finding English and translated Arabic documents similarities using GHSOM Selamat, Ali Ismail, Hanadi Hassen QA75 Electronic computers. Computer science The idea of finding similar news across Arabic and English sources is that to provide the audience with multiple views of the broadcasted news because reading the news from a single source may not always reflects on what happening around the world due different background, cultures and opinions of the readers and writers. To achieve this goal there are many techniques have been used to cluster the documents with similar themes. In this paper, we analyze the similarity of the views on the news written in the news translations form Arabic and English texts using Self-organizing Map (SOM). However, we have found there are some difficulties in SOM that affect its performance. In order to improve the problems of performance, we have used a Growing Hierarchical Self-organizing Map (GHSOM). The main advantage of such a mapping is the ease by which a user gains an idea regarding the structure of the data by analyzing the map. Thousands of news documents have been collected from Arabic and English news sources from the web in order to train both algorithms. Form experiments, the results show that using GHSOM is better in terms of clustering documents with the same opinions. Institute of Electrical and Electronics Engineers 2008 Book Section PeerReviewed Selamat, Ali and Ismail, Hanadi Hassen (2008) Finding English and translated Arabic documents similarities using GHSOM. In: Proceedings of the International Conference on Computer and Communication Engineering 2008, ICCCE08: Global Links for Human Development. Institute of Electrical and Electronics Engineers, New York, 460 -465. ISBN 978-142441692-9 http://dx.doi.org/10.1109/ICCCE.2008.4580647 DOI:10.1109/ICCCE.2008.4580647
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Selamat, Ali
Ismail, Hanadi Hassen
Finding English and translated Arabic documents similarities using GHSOM
description The idea of finding similar news across Arabic and English sources is that to provide the audience with multiple views of the broadcasted news because reading the news from a single source may not always reflects on what happening around the world due different background, cultures and opinions of the readers and writers. To achieve this goal there are many techniques have been used to cluster the documents with similar themes. In this paper, we analyze the similarity of the views on the news written in the news translations form Arabic and English texts using Self-organizing Map (SOM). However, we have found there are some difficulties in SOM that affect its performance. In order to improve the problems of performance, we have used a Growing Hierarchical Self-organizing Map (GHSOM). The main advantage of such a mapping is the ease by which a user gains an idea regarding the structure of the data by analyzing the map. Thousands of news documents have been collected from Arabic and English news sources from the web in order to train both algorithms. Form experiments, the results show that using GHSOM is better in terms of clustering documents with the same opinions.
format Book Section
author Selamat, Ali
Ismail, Hanadi Hassen
author_facet Selamat, Ali
Ismail, Hanadi Hassen
author_sort Selamat, Ali
title Finding English and translated Arabic documents similarities using GHSOM
title_short Finding English and translated Arabic documents similarities using GHSOM
title_full Finding English and translated Arabic documents similarities using GHSOM
title_fullStr Finding English and translated Arabic documents similarities using GHSOM
title_full_unstemmed Finding English and translated Arabic documents similarities using GHSOM
title_sort finding english and translated arabic documents similarities using ghsom
publisher Institute of Electrical and Electronics Engineers
publishDate 2008
url http://eprints.utm.my/id/eprint/12570/
http://dx.doi.org/10.1109/ICCCE.2008.4580647
_version_ 1643645986755575808