Developing a new statistical method for Chinese text segmentation

A new statistical formula for Chinese text segmentation called Contextual Information Formula (OF) was developed empirically for identifying 2 and 3-character words. It was developed by performing stepwise logistic regression using a sample of sentences that had been manually segmented. 300 sentence...

全面介紹

Saved in:

書目詳細資料
主要作者:	Dai, Yubin
其他作者:	Khoo, Christopher Soo Guan
格式:	Theses and Dissertations
出版:	2008
主題:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity
在線閱讀:	http://hdl.handle.net/10356/2614
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

id	sg-ntu-dr.10356-2614
record_format	dspace
spelling	sg-ntu-dr.10356-26142023-03-04T00:38:07Z Developing a new statistical method for Chinese text segmentation Dai, Yubin Khoo, Christopher Soo Guan School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity A new statistical formula for Chinese text segmentation called Contextual Information Formula (OF) was developed empirically for identifying 2 and 3-character words. It was developed by performing stepwise logistic regression using a sample of sentences that had been manually segmented. 300 sentences were used for model building and 100 sentences were set aside for model validation and evaluation. Relative frequencies, document frequencies, weighted document frequencies, within-document frequencies of characters, bigrams and trigrams were included in the study. Master of Applied Science 2008-09-17T09:06:16Z 2008-09-17T09:06:16Z 1999 1999 Thesis http://hdl.handle.net/10356/2614 Nanyang Technological University application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Theory of computation::Analysis of algorithms and problem complexity Dai, Yubin Developing a new statistical method for Chinese text segmentation
description	A new statistical formula for Chinese text segmentation called Contextual Information Formula (OF) was developed empirically for identifying 2 and 3-character words. It was developed by performing stepwise logistic regression using a sample of sentences that had been manually segmented. 300 sentences were used for model building and 100 sentences were set aside for model validation and evaluation. Relative frequencies, document frequencies, weighted document frequencies, within-document frequencies of characters, bigrams and trigrams were included in the study.
author2	Khoo, Christopher Soo Guan
author_facet	Khoo, Christopher Soo Guan Dai, Yubin
format	Theses and Dissertations
author	Dai, Yubin
author_sort	Dai, Yubin
title	Developing a new statistical method for Chinese text segmentation
title_short	Developing a new statistical method for Chinese text segmentation
title_full	Developing a new statistical method for Chinese text segmentation
title_fullStr	Developing a new statistical method for Chinese text segmentation
title_full_unstemmed	Developing a new statistical method for Chinese text segmentation
title_sort	developing a new statistical method for chinese text segmentation
publishDate	2008
url	http://hdl.handle.net/10356/2614
_version_	1759855961030262784

Developing a new statistical method for Chinese text segmentation

相似書籍