iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding

An enhancer is a short (50–1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers i...

Full description

Saved in:
Bibliographic Details
Main Authors: Le, Nguyen Quoc Khanh, Yapp, Edward Kien Yee, Ho, Quang-Thai, Nagasundaram, Nagarajan, Ou, Yu-Yen, Yeh, Hui-Yuan
Other Authors: School of Humanities
Format: Article
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/150965
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-150965
record_format dspace
spelling sg-ntu-dr.10356-1509652021-05-31T08:16:39Z iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding Le, Nguyen Quoc Khanh Yapp, Edward Kien Yee Ho, Quang-Thai Nagasundaram, Nagarajan Ou, Yu-Yen Yeh, Hui-Yuan School of Humanities Science::Biological sciences Skip Gram Continuous Bag of Words An enhancer is a short (50–1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers in genomics, the classification of enhancers has become a popular area of research in computational biology. Despite the few computational tools employed to address this problem, their resulting performance still requires improvements. In this study, we treat enhancers by the word embeddings, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to classify them. We present iEnhancer-5Step, a web server containing two-layer classifiers to identify enhancers and their strength. We are able to attain an independent test accuracy of 79% and 63.5% in the two layers, respectively. Compared to current predictors on the same dataset, our proposed method is able to yield superior performance as compared to the other methods. Moreover, this study provides a basis for further research that can enrich the field of applying natural language processing techniques in biological sequences. iEnhancer-5Step is freely accessible via http://biologydeep.com/fastenc/. Nanyang Technological University This work has been supported by the Nanyang Technological University Start-Up Grant. 2021-05-31T08:16:39Z 2021-05-31T08:16:39Z 2019 Journal Article Le, N. Q. K., Yapp, E. K. Y., Ho, Q., Nagasundaram, N., Ou, Y. & Yeh, H. (2019). iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding. Analytical Biochemistry, 571, 53-61. https://dx.doi.org/10.1016/j.ab.2019.02.017 0003-2697 https://hdl.handle.net/10356/150965 10.1016/j.ab.2019.02.017 30822398 2-s2.0-85062237812 571 53 61 en Analytical Biochemistry © 2019 Elsevier Inc. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Biological sciences
Skip Gram
Continuous Bag of Words
spellingShingle Science::Biological sciences
Skip Gram
Continuous Bag of Words
Le, Nguyen Quoc Khanh
Yapp, Edward Kien Yee
Ho, Quang-Thai
Nagasundaram, Nagarajan
Ou, Yu-Yen
Yeh, Hui-Yuan
iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding
description An enhancer is a short (50–1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers in genomics, the classification of enhancers has become a popular area of research in computational biology. Despite the few computational tools employed to address this problem, their resulting performance still requires improvements. In this study, we treat enhancers by the word embeddings, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to classify them. We present iEnhancer-5Step, a web server containing two-layer classifiers to identify enhancers and their strength. We are able to attain an independent test accuracy of 79% and 63.5% in the two layers, respectively. Compared to current predictors on the same dataset, our proposed method is able to yield superior performance as compared to the other methods. Moreover, this study provides a basis for further research that can enrich the field of applying natural language processing techniques in biological sequences. iEnhancer-5Step is freely accessible via http://biologydeep.com/fastenc/.
author2 School of Humanities
author_facet School of Humanities
Le, Nguyen Quoc Khanh
Yapp, Edward Kien Yee
Ho, Quang-Thai
Nagasundaram, Nagarajan
Ou, Yu-Yen
Yeh, Hui-Yuan
format Article
author Le, Nguyen Quoc Khanh
Yapp, Edward Kien Yee
Ho, Quang-Thai
Nagasundaram, Nagarajan
Ou, Yu-Yen
Yeh, Hui-Yuan
author_sort Le, Nguyen Quoc Khanh
title iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding
title_short iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding
title_full iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding
title_fullStr iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding
title_full_unstemmed iEnhancer-5step : identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding
title_sort ienhancer-5step : identifying enhancers using hidden information of dna sequences via chou's 5-step rule and word embedding
publishDate 2021
url https://hdl.handle.net/10356/150965
_version_ 1702418253465255936