Database retrieval for novelty detection
Research in the area of optimizing databases in any Database Management System (DBMS) has been evolving constantly. Today, programming languages are being integrated into database systems to help professional programmers develop software quickly to meet datelines. Therefore, the design of a datab...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/17157 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Research in the area of optimizing databases in any Database Management System
(DBMS) has been evolving constantly. Today, programming languages are being
integrated into database systems to help professional programmers develop software
quickly to meet datelines. Therefore, the design of a database must cater to both the needs
of the customers and the efficiency of the database processes.
In this report, a database application namely, Novelty Detection, detects new
documents for readers who do not want repeated documents to be read again. Therefore,
a database is needed to store history and current documents. In this particular research,
records are replicated to test for the optimization of time to retrieve, insert data and space
needed to store the data. The number of records experimented will be in 8k, 100k, 200k,
500k, 2 million, 5 million, 10 million.
The experiment will be done on both the Sentence Level and the Document Level. In
both level, investigating of data optimization and the use of proper indexing is done. In
MYSQL, the use of MYSQL B-Tree index was used to speed up selecting of data. In
addition the use of EXPLAIN help to enable us to properly index the correct data column
and avoid redundant indexing. The overall result was an improvement of over 90% in
time when selecting data. Optimizing data types were also investigated to ensure that
extra work load was not required by MYSQL to select data. Overall, the combined
optimization gave an improvement of 90%.
ii
A technique known as batching was also introduced to help the speeding up of
inserting results after novelty detection has been done. The effect of inserting results in
batches and not as a single insert has also help seen an improvement of time in over 90%.
As such, the results obtained had been benchmarked for the real Novelty Detection
application. |
---|