Chinese text retrieval system
In this fast growing information age, information retrieval (IR) systems and their related fields have now attracted close attention of researchers in the field of information science. Recently, as Asian languages like Chinese, Japanese and Korean are starting to gain popularity, they are now emp...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2008
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/13586 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In this fast growing information age, information retrieval (IR) systems and
their related fields have now attracted close attention of researchers in the field of
information science. Recently, as Asian languages like Chinese, Japanese and Korean
are starting to gain popularity, they are now employed as the language medium in an
IR system, especially so with the Chinese language. In order to employ Chinese
language in the IR domain, the fundamental linguistic problem that lies in the Chinese
text will have to be resolved. Unlike the English or other European languages,
Chinese language does not possess spaces and other punctuation marks as word
separators. Therefore, in order to extract meaningful words from lines of Chinese text
for text processing, Chinese text segmentation would have to be carried out. This is an
essential process during the indexing of corpus for a Chinese text retrieval system.
In this project, the primary objective is to develop a prototype of a Chinese
text retrieval system that can be used for future research purposes. Instead of building
one from scratch, an alternative is found in the form of the mg system. Being a
retrieval system that is capable of performing fast and efficient indexing and retrieval
on both textual and graphical document collections, it was chosen as the base system
that our Chinese text retrieval system is to be built on. |
---|