The old newspaper project
Optical Character Recognition (OCR) is commonly used nowadays for printouts and documents conversion in sociology, communication and education studies. In traditional OCR models, texts are extracted sequentially within the whole page. In the case of newspaper, texts are arranged in columns based on...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/157550 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-157550 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1575502023-07-07T19:17:39Z The old newspaper project Mao, Junke Ling Keck Voon School of Electrical and Electronic Engineering EKVLING@ntu.edu.sg Engineering::Electrical and electronic engineering Optical Character Recognition (OCR) is commonly used nowadays for printouts and documents conversion in sociology, communication and education studies. In traditional OCR models, texts are extracted sequentially within the whole page. In the case of newspaper, texts are arranged in columns based on articles with images embedded. As a result, the conversion of text materials with such a complex layout, such as multi-column text, headlines, embedded figures, etc, might impair the outcomes of the OCR results. To improve the efficiency of converting images of newspapers, we built a specialized model for newspaper recognition. The integrated model will perform object segmentation to extract the relevant components in the image, i.e., the headlines, embedded figures, etc, and performs OCR on these components accordingly. The output would be text document logically arranged with headlines, text body in single column, and embedded images appended at the end. Bachelor of Engineering (Electrical and Electronic Engineering) 2022-05-19T13:03:56Z 2022-05-19T13:03:56Z 2022 Final Year Project (FYP) Mao, J. (2022). The old newspaper project. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157550 https://hdl.handle.net/10356/157550 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Electrical and electronic engineering Mao, Junke The old newspaper project |
description |
Optical Character Recognition (OCR) is commonly used nowadays for printouts and documents conversion in sociology, communication and education studies. In traditional OCR models, texts are extracted sequentially within the whole page. In the case of newspaper, texts are arranged in columns based on articles with images embedded. As a result, the conversion of text materials with such a complex layout, such as multi-column text, headlines, embedded figures, etc, might impair the outcomes of the OCR results. To improve the efficiency of converting images of newspapers, we built a specialized model for newspaper recognition. The integrated model will perform object segmentation to extract the relevant components in the image, i.e., the headlines, embedded figures, etc, and performs OCR on these components accordingly. The output would be text document logically arranged with headlines, text body in single column, and embedded images appended at the end. |
author2 |
Ling Keck Voon |
author_facet |
Ling Keck Voon Mao, Junke |
format |
Final Year Project |
author |
Mao, Junke |
author_sort |
Mao, Junke |
title |
The old newspaper project |
title_short |
The old newspaper project |
title_full |
The old newspaper project |
title_fullStr |
The old newspaper project |
title_full_unstemmed |
The old newspaper project |
title_sort |
old newspaper project |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/157550 |
_version_ |
1772828667202764800 |