Chinese text retrieval system

In this fast growing information age, information retrieval (IR) systems and their related fields have now attracted close attention of researchers in the field of information science. Recently, as Asian languages like Chinese, Japanese and Korean are starting to gain popularity, they are now emp...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Hong Koon.
Other Authors: Foo, Schubert Shou Boon
Format: Theses and Dissertations
Language:English
Published: 2008
Subjects:
Online Access:http://hdl.handle.net/10356/13586
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In this fast growing information age, information retrieval (IR) systems and their related fields have now attracted close attention of researchers in the field of information science. Recently, as Asian languages like Chinese, Japanese and Korean are starting to gain popularity, they are now employed as the language medium in an IR system, especially so with the Chinese language. In order to employ Chinese language in the IR domain, the fundamental linguistic problem that lies in the Chinese text will have to be resolved. Unlike the English or other European languages, Chinese language does not possess spaces and other punctuation marks as word separators. Therefore, in order to extract meaningful words from lines of Chinese text for text processing, Chinese text segmentation would have to be carried out. This is an essential process during the indexing of corpus for a Chinese text retrieval system. In this project, the primary objective is to develop a prototype of a Chinese text retrieval system that can be used for future research purposes. Instead of building one from scratch, an alternative is found in the form of the mg system. Being a retrieval system that is capable of performing fast and efficient indexing and retrieval on both textual and graphical document collections, it was chosen as the base system that our Chinese text retrieval system is to be built on.