iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding
An enhancer is a short (50–1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers i...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/150965 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-150965 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1509652021-05-31T08:16:39Z iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding Le, Nguyen Quoc Khanh Yapp, Edward Kien Yee Ho, Quang-Thai Nagasundaram, Nagarajan Ou, Yu-Yen Yeh, Hui-Yuan School of Humanities Science::Biological sciences Skip Gram Continuous Bag of Words An enhancer is a short (50–1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers in genomics, the classification of enhancers has become a popular area of research in computational biology. Despite the few computational tools employed to address this problem, their resulting performance still requires improvements. In this study, we treat enhancers by the word embeddings, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to classify them. We present iEnhancer-5Step, a web server containing two-layer classifiers to identify enhancers and their strength. We are able to attain an independent test accuracy of 79% and 63.5% in the two layers, respectively. Compared to current predictors on the same dataset, our proposed method is able to yield superior performance as compared to the other methods. Moreover, this study provides a basis for further research that can enrich the field of applying natural language processing techniques in biological sequences. iEnhancer-5Step is freely accessible via http://biologydeep.com/fastenc/. Nanyang Technological University This work has been supported by the Nanyang Technological University Start-Up Grant. 2021-05-31T08:16:39Z 2021-05-31T08:16:39Z 2019 Journal Article Le, N. Q. K., Yapp, E. K. Y., Ho, Q., Nagasundaram, N., Ou, Y. & Yeh, H. (2019). iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding. Analytical Biochemistry, 571, 53-61. https://dx.doi.org/10.1016/j.ab.2019.02.017 0003-2697 https://hdl.handle.net/10356/150965 10.1016/j.ab.2019.02.017 30822398 2-s2.0-85062237812 571 53 61 en Analytical Biochemistry © 2019 Elsevier Inc. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Science::Biological sciences Skip Gram Continuous Bag of Words |
spellingShingle |
Science::Biological sciences Skip Gram Continuous Bag of Words Le, Nguyen Quoc Khanh Yapp, Edward Kien Yee Ho, Quang-Thai Nagasundaram, Nagarajan Ou, Yu-Yen Yeh, Hui-Yuan iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding |
description |
An enhancer is a short (50–1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers in genomics, the classification of enhancers has become a popular area of research in computational biology. Despite the few computational tools employed to address this problem, their resulting performance still requires improvements. In this study, we treat enhancers by the word embeddings, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to classify them. We present iEnhancer-5Step, a web server containing two-layer classifiers to identify enhancers and their strength. We are able to attain an independent test accuracy of 79% and 63.5% in the two layers, respectively. Compared to current predictors on the same dataset, our proposed method is able to yield superior performance as compared to the other methods. Moreover, this study provides a basis for further research that can enrich the field of applying natural language processing techniques in biological sequences. iEnhancer-5Step is freely accessible via http://biologydeep.com/fastenc/. |
author2 |
School of Humanities |
author_facet |
School of Humanities Le, Nguyen Quoc Khanh Yapp, Edward Kien Yee Ho, Quang-Thai Nagasundaram, Nagarajan Ou, Yu-Yen Yeh, Hui-Yuan |
format |
Article |
author |
Le, Nguyen Quoc Khanh Yapp, Edward Kien Yee Ho, Quang-Thai Nagasundaram, Nagarajan Ou, Yu-Yen Yeh, Hui-Yuan |
author_sort |
Le, Nguyen Quoc Khanh |
title |
iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding |
title_short |
iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding |
title_full |
iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding |
title_fullStr |
iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding |
title_full_unstemmed |
iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding |
title_sort |
ienhancer-5step : identifying enhancers using hidden information of dna sequences via chou's 5-step rule and word embedding |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/150965 |
_version_ |
1702418253465255936 |