Writing style modelling based on grapheme distributions : application to on-line writer identification.
The increasingly pervasive spread of mobile digital devices such as mobile smartphones or digital tablets that use digital pens brought about the emergence of a new class of documents; online handwritten documents. The rapid increase in the number of online handwritten documents using such mobile de...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/54630 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-54630 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Tan, Guoxian. Writing style modelling based on grapheme distributions : application to on-line writer identification. |
description |
The increasingly pervasive spread of mobile digital devices such as mobile smartphones or digital tablets that use digital pens brought about the emergence of a new class of documents; online handwritten documents. The rapid increase in the number of online handwritten documents using such mobile devices leads to mounting pressure on finding innovative solutions towards faster processing, indexing and retrieval of these documents from databases. One such method to address this issue could be to extract writer information derived from the raw ink signal for indexing and retrieval of the documents. Hence, online writer identification is a topic of much renewed interest today because of its importance in applications such as writer adaptation, routing of documents and forensic document analysis.
This thesis proposes an automatic text-independent writer identification framework that integrates an industrial handwriting recognition system, which is used to perform an automatic segmentation of an online handwritten document at the character level. The proposed method is a text independent method that does not place any constraints on the content written or writing styles of the writers, to extract writer information at the character level from online handwritten documents and presents a novel approach to cluster and classify the resulting character prototypes for writer identification. This is a novel approach because prototypes are trained as characters using adapted Information Retrieval models, instead of the typical grapheme approach.
Subsequently, a fuzzy c-means approach is adopted to estimate statistical distributions of character prototypes on different letters of the alphabet. Character prototypes allow for a more intuitive prototype model compared to using grapheme prototypes which are often part of a character and are not meaningful by themselves as prototypes. Furthermore, character prototypes allow for more robust and consistent prototypes to be built in the recognition process. These distributions model the unique handwriting styles of the writers. The proposed system attained an accuracy of 99.2% when retrieved from a database of 120 French writers.
In addition, the framework can be extended to any languages that use an alphabet writing system such as Latin, Greek or Cyrillic alphabet systems. In order to handle this, the framework is modified to examine the character prototypes at a deeper level. We hypothesize that the alphabet knowledge inherent in such character prototypes can provide additional writer information pertaining to their styles of writing and their identities. This thesis utilizes the character prototype approach previously mentioned to establish evidence that knowledge of the alphabet offer additional clues which help in the writer identification process. An Alphabet Information Coefficient is consequently introduced to better exploit such alphabet knowledge for writer identification. Our experiments showed an increase of writer identification accuracy from 66.0% to 87.0% on a database of 200 reference writers on a Reuters-21578 dataset of English writers when alphabet knowledge was used. Experiments related to the reduction in dimensionality of the writer identification system are also reported. Our results show that the discriminative power of different letters of the alphabet can be used to reduce the complexity while maintaining the same level of performance for the writer identification system. |
author2 |
Kot Chichung, Alex |
author_facet |
Kot Chichung, Alex Tan, Guoxian. |
format |
Theses and Dissertations |
author |
Tan, Guoxian. |
author_sort |
Tan, Guoxian. |
title |
Writing style modelling based on grapheme distributions : application to on-line writer identification. |
title_short |
Writing style modelling based on grapheme distributions : application to on-line writer identification. |
title_full |
Writing style modelling based on grapheme distributions : application to on-line writer identification. |
title_fullStr |
Writing style modelling based on grapheme distributions : application to on-line writer identification. |
title_full_unstemmed |
Writing style modelling based on grapheme distributions : application to on-line writer identification. |
title_sort |
writing style modelling based on grapheme distributions : application to on-line writer identification. |
publishDate |
2013 |
url |
https://hdl.handle.net/10356/54630 |
_version_ |
1772825597992501248 |
spelling |
sg-ntu-dr.10356-546302023-07-04T16:22:31Z Writing style modelling based on grapheme distributions : application to on-line writer identification. Tan, Guoxian. Kot Chichung, Alex School of Electrical and Electronic Engineering Centre for Information Security DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing The increasingly pervasive spread of mobile digital devices such as mobile smartphones or digital tablets that use digital pens brought about the emergence of a new class of documents; online handwritten documents. The rapid increase in the number of online handwritten documents using such mobile devices leads to mounting pressure on finding innovative solutions towards faster processing, indexing and retrieval of these documents from databases. One such method to address this issue could be to extract writer information derived from the raw ink signal for indexing and retrieval of the documents. Hence, online writer identification is a topic of much renewed interest today because of its importance in applications such as writer adaptation, routing of documents and forensic document analysis. This thesis proposes an automatic text-independent writer identification framework that integrates an industrial handwriting recognition system, which is used to perform an automatic segmentation of an online handwritten document at the character level. The proposed method is a text independent method that does not place any constraints on the content written or writing styles of the writers, to extract writer information at the character level from online handwritten documents and presents a novel approach to cluster and classify the resulting character prototypes for writer identification. This is a novel approach because prototypes are trained as characters using adapted Information Retrieval models, instead of the typical grapheme approach. Subsequently, a fuzzy c-means approach is adopted to estimate statistical distributions of character prototypes on different letters of the alphabet. Character prototypes allow for a more intuitive prototype model compared to using grapheme prototypes which are often part of a character and are not meaningful by themselves as prototypes. Furthermore, character prototypes allow for more robust and consistent prototypes to be built in the recognition process. These distributions model the unique handwriting styles of the writers. The proposed system attained an accuracy of 99.2% when retrieved from a database of 120 French writers. In addition, the framework can be extended to any languages that use an alphabet writing system such as Latin, Greek or Cyrillic alphabet systems. In order to handle this, the framework is modified to examine the character prototypes at a deeper level. We hypothesize that the alphabet knowledge inherent in such character prototypes can provide additional writer information pertaining to their styles of writing and their identities. This thesis utilizes the character prototype approach previously mentioned to establish evidence that knowledge of the alphabet offer additional clues which help in the writer identification process. An Alphabet Information Coefficient is consequently introduced to better exploit such alphabet knowledge for writer identification. Our experiments showed an increase of writer identification accuracy from 66.0% to 87.0% on a database of 200 reference writers on a Reuters-21578 dataset of English writers when alphabet knowledge was used. Experiments related to the reduction in dimensionality of the writer identification system are also reported. Our results show that the discriminative power of different letters of the alphabet can be used to reduce the complexity while maintaining the same level of performance for the writer identification system. DOCTOR OF PHILOSOPHY (EEE) 2013-07-01T02:04:07Z 2013-07-01T02:04:07Z 2013 2013 Thesis Tan, G. (2013). Writing style modelling based on grapheme distributions : application to on-line writer identification. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/54630 10.32657/10356/54630 en 152 p. application/pdf |