Mining HIV : 1 information from literature
HIV-1 virus frequently mutates to increase resistance against certain drugs. The mutations are partly due to the histones modification in the patient’s genomes. Information of histones modifications are not easily accessible. There are online databases that contained a large amount of documents abou...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/59053 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-59053 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-590532023-03-03T20:59:51Z Mining HIV : 1 information from literature Lim, Clarence Jia Xian School of Computer Engineering Kim Jung-Jae DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval HIV-1 virus frequently mutates to increase resistance against certain drugs. The mutations are partly due to the histones modification in the patient’s genomes. Information of histones modifications are not easily accessible. There are online databases that contained a large amount of documents about the histones modification. However, they are very time consuming for biologist to retrieve manually. Thus, the project attempts to automate the retrieval of the information from the databases and integrate them into a single source for ease of access. The program created consists of certain components to aid the construction of the information source. Document Collection System is the first component of the program which collects documents and abstracts from the online databases and cleaned them for the next stage to process. TEES is the next component which takes in the cleaned documents and extracts the proteins and histone modification events from them. TEEStoCSV Convertor program takes the output of TEES and convert the individual file data into CSV format. Histone Events Compilation program combines the individual CSV files into 1 overall CSV file and filter out the invalid histones. Sampling Program takes the overall CSV file and randomly select 100 samples for the verification process. Normalization Program takes the overall CSV file and normalized the terms for the visualization program, Graphviz. GeneToUniprot program takes the overall CSV file and convert the genes names to Swiss-Prot IDs. Lastly, the XML Constructor program uses the output from the GeneToUniprot program and combined with an extracted histone file to construct the XML file. The overall design architecture uses a pipe and filter style to allow extensibility and ease of modification to individual components. The verification results were overall satisfied as more than half of the samples were correct. Some of the error types found were also able to be resolved. The final result of the program is a XML file which allows the information to be easily distributed and access. Some recommendation is suggested in this project to increase the quality of the results by improving the TEES system’s event detection. Bachelor of Engineering (Computer Science) 2014-04-22T02:05:31Z 2014-04-22T02:05:31Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/59053 en Nanyang Technological University 54 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Lim, Clarence Jia Xian Mining HIV : 1 information from literature |
description |
HIV-1 virus frequently mutates to increase resistance against certain drugs. The mutations are partly due to the histones modification in the patient’s genomes. Information of histones modifications are not easily accessible. There are online databases that contained a large amount of documents about the histones modification. However, they are very time consuming for biologist to retrieve manually. Thus, the project attempts to automate the retrieval of the information from the databases and integrate them into a single source for ease of access. The program created consists of certain components to aid the construction of the information source. Document Collection System is the first component of the program which collects documents and abstracts from the online databases and cleaned them for the next stage to process. TEES is the next component which takes in the cleaned documents and extracts the proteins and histone modification events from them. TEEStoCSV Convertor program takes the output of TEES and convert the individual file data into CSV format. Histone Events Compilation program combines the individual CSV files into 1 overall CSV file and filter out the invalid histones. Sampling Program takes the overall CSV file and randomly select 100 samples for the verification process. Normalization Program takes the overall CSV file and normalized the terms for the visualization program, Graphviz. GeneToUniprot program takes the overall CSV file and convert the genes names to Swiss-Prot IDs. Lastly, the XML Constructor program uses the output from the GeneToUniprot program and combined with an extracted histone file to construct the XML file. The overall design architecture uses a pipe and filter style to allow extensibility and ease of modification to individual components. The verification results were overall satisfied as more than half of the samples were correct. Some of the error types found were also able to be resolved. The final result of the program is a XML file which allows the information to be easily distributed and access. Some recommendation is suggested in this project to increase the quality of the results by improving the TEES system’s event detection. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Lim, Clarence Jia Xian |
format |
Final Year Project |
author |
Lim, Clarence Jia Xian |
author_sort |
Lim, Clarence Jia Xian |
title |
Mining HIV : 1 information from literature |
title_short |
Mining HIV : 1 information from literature |
title_full |
Mining HIV : 1 information from literature |
title_fullStr |
Mining HIV : 1 information from literature |
title_full_unstemmed |
Mining HIV : 1 information from literature |
title_sort |
mining hiv : 1 information from literature |
publishDate |
2014 |
url |
http://hdl.handle.net/10356/59053 |
_version_ |
1759857117756391424 |