Database retrieval for novelty detection

Research in the area of optimizing databases in any Database Management System (DBMS) has been evolving constantly. Today, programming languages are being integrated into database systems to help professional programmers develop software quickly to meet datelines. Therefore, the design of a datab...

Full description

Saved in:

Bibliographic Details
Main Author:	Ong, Chun Lin.
Other Authors:	Tsai Flora S
Format:	Final Year Project
Language:	English
Published:	2009
Subjects:	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	http://hdl.handle.net/10356/17157
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-17157
record_format	dspace
spelling	sg-ntu-dr.10356-171572023-07-07T17:06:17Z Database retrieval for novelty detection Ong, Chun Lin. Tsai Flora S School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Research in the area of optimizing databases in any Database Management System (DBMS) has been evolving constantly. Today, programming languages are being integrated into database systems to help professional programmers develop software quickly to meet datelines. Therefore, the design of a database must cater to both the needs of the customers and the efficiency of the database processes. In this report, a database application namely, Novelty Detection, detects new documents for readers who do not want repeated documents to be read again. Therefore, a database is needed to store history and current documents. In this particular research, records are replicated to test for the optimization of time to retrieve, insert data and space needed to store the data. The number of records experimented will be in 8k, 100k, 200k, 500k, 2 million, 5 million, 10 million. The experiment will be done on both the Sentence Level and the Document Level. In both level, investigating of data optimization and the use of proper indexing is done. In MYSQL, the use of MYSQL B-Tree index was used to speed up selecting of data. In addition the use of EXPLAIN help to enable us to properly index the correct data column and avoid redundant indexing. The overall result was an improvement of over 90% in time when selecting data. Optimizing data types were also investigated to ensure that extra work load was not required by MYSQL to select data. Overall, the combined optimization gave an improvement of 90%. ii A technique known as batching was also introduced to help the speeding up of inserting results after novelty detection has been done. The effect of inserting results in batches and not as a single insert has also help seen an improvement of time in over 90%. As such, the results obtained had been benchmarked for the real Novelty Detection application. Bachelor of Engineering 2009-06-01T03:26:49Z 2009-06-01T03:26:49Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/17157 en Nanyang Technological University 66 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Ong, Chun Lin. Database retrieval for novelty detection
description	Research in the area of optimizing databases in any Database Management System (DBMS) has been evolving constantly. Today, programming languages are being integrated into database systems to help professional programmers develop software quickly to meet datelines. Therefore, the design of a database must cater to both the needs of the customers and the efficiency of the database processes. In this report, a database application namely, Novelty Detection, detects new documents for readers who do not want repeated documents to be read again. Therefore, a database is needed to store history and current documents. In this particular research, records are replicated to test for the optimization of time to retrieve, insert data and space needed to store the data. The number of records experimented will be in 8k, 100k, 200k, 500k, 2 million, 5 million, 10 million. The experiment will be done on both the Sentence Level and the Document Level. In both level, investigating of data optimization and the use of proper indexing is done. In MYSQL, the use of MYSQL B-Tree index was used to speed up selecting of data. In addition the use of EXPLAIN help to enable us to properly index the correct data column and avoid redundant indexing. The overall result was an improvement of over 90% in time when selecting data. Optimizing data types were also investigated to ensure that extra work load was not required by MYSQL to select data. Overall, the combined optimization gave an improvement of 90%. ii A technique known as batching was also introduced to help the speeding up of inserting results after novelty detection has been done. The effect of inserting results in batches and not as a single insert has also help seen an improvement of time in over 90%. As such, the results obtained had been benchmarked for the real Novelty Detection application.
author2	Tsai Flora S
author_facet	Tsai Flora S Ong, Chun Lin.
format	Final Year Project
author	Ong, Chun Lin.
author_sort	Ong, Chun Lin.
title	Database retrieval for novelty detection
title_short	Database retrieval for novelty detection
title_full	Database retrieval for novelty detection
title_fullStr	Database retrieval for novelty detection
title_full_unstemmed	Database retrieval for novelty detection
title_sort	database retrieval for novelty detection
publishDate	2009
url	http://hdl.handle.net/10356/17157
_version_	1772826939934900224

Database retrieval for novelty detection

Similar Items