Mining HIV : 1 information from literature

HIV-1 virus frequently mutates to increase resistance against certain drugs. The mutations are partly due to the histones modification in the patient’s genomes. Information of histones modifications are not easily accessible. There are online databases that contained a large amount of documents abou...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Clarence Jia Xian
Other Authors: School of Computer Engineering
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/59053
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-59053
record_format dspace
spelling sg-ntu-dr.10356-590532023-03-03T20:59:51Z Mining HIV : 1 information from literature Lim, Clarence Jia Xian School of Computer Engineering Kim Jung-Jae DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval HIV-1 virus frequently mutates to increase resistance against certain drugs. The mutations are partly due to the histones modification in the patient’s genomes. Information of histones modifications are not easily accessible. There are online databases that contained a large amount of documents about the histones modification. However, they are very time consuming for biologist to retrieve manually. Thus, the project attempts to automate the retrieval of the information from the databases and integrate them into a single source for ease of access. The program created consists of certain components to aid the construction of the information source. Document Collection System is the first component of the program which collects documents and abstracts from the online databases and cleaned them for the next stage to process. TEES is the next component which takes in the cleaned documents and extracts the proteins and histone modification events from them. TEEStoCSV Convertor program takes the output of TEES and convert the individual file data into CSV format. Histone Events Compilation program combines the individual CSV files into 1 overall CSV file and filter out the invalid histones. Sampling Program takes the overall CSV file and randomly select 100 samples for the verification process. Normalization Program takes the overall CSV file and normalized the terms for the visualization program, Graphviz. GeneToUniprot program takes the overall CSV file and convert the genes names to Swiss-Prot IDs. Lastly, the XML Constructor program uses the output from the GeneToUniprot program and combined with an extracted histone file to construct the XML file. The overall design architecture uses a pipe and filter style to allow extensibility and ease of modification to individual components. The verification results were overall satisfied as more than half of the samples were correct. Some of the error types found were also able to be resolved. The final result of the program is a XML file which allows the information to be easily distributed and access. Some recommendation is suggested in this project to increase the quality of the results by improving the TEES system’s event detection. Bachelor of Engineering (Computer Science) 2014-04-22T02:05:31Z 2014-04-22T02:05:31Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/59053 en Nanyang Technological University 54 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Lim, Clarence Jia Xian
Mining HIV : 1 information from literature
description HIV-1 virus frequently mutates to increase resistance against certain drugs. The mutations are partly due to the histones modification in the patient’s genomes. Information of histones modifications are not easily accessible. There are online databases that contained a large amount of documents about the histones modification. However, they are very time consuming for biologist to retrieve manually. Thus, the project attempts to automate the retrieval of the information from the databases and integrate them into a single source for ease of access. The program created consists of certain components to aid the construction of the information source. Document Collection System is the first component of the program which collects documents and abstracts from the online databases and cleaned them for the next stage to process. TEES is the next component which takes in the cleaned documents and extracts the proteins and histone modification events from them. TEEStoCSV Convertor program takes the output of TEES and convert the individual file data into CSV format. Histone Events Compilation program combines the individual CSV files into 1 overall CSV file and filter out the invalid histones. Sampling Program takes the overall CSV file and randomly select 100 samples for the verification process. Normalization Program takes the overall CSV file and normalized the terms for the visualization program, Graphviz. GeneToUniprot program takes the overall CSV file and convert the genes names to Swiss-Prot IDs. Lastly, the XML Constructor program uses the output from the GeneToUniprot program and combined with an extracted histone file to construct the XML file. The overall design architecture uses a pipe and filter style to allow extensibility and ease of modification to individual components. The verification results were overall satisfied as more than half of the samples were correct. Some of the error types found were also able to be resolved. The final result of the program is a XML file which allows the information to be easily distributed and access. Some recommendation is suggested in this project to increase the quality of the results by improving the TEES system’s event detection.
author2 School of Computer Engineering
author_facet School of Computer Engineering
Lim, Clarence Jia Xian
format Final Year Project
author Lim, Clarence Jia Xian
author_sort Lim, Clarence Jia Xian
title Mining HIV : 1 information from literature
title_short Mining HIV : 1 information from literature
title_full Mining HIV : 1 information from literature
title_fullStr Mining HIV : 1 information from literature
title_full_unstemmed Mining HIV : 1 information from literature
title_sort mining hiv : 1 information from literature
publishDate 2014
url http://hdl.handle.net/10356/59053
_version_ 1759857117756391424