GO2Vec : transforming GO terms and proteins to vector representations via graph embeddings

Background: Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO ter...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhong, Xiaoshi, Kaalia, Rama, Rajapakse, Jagath Chandana
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/145882
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-145882
record_format dspace
spelling sg-ntu-dr.10356-1458822021-01-13T05:36:32Z GO2Vec : transforming GO terms and proteins to vector representations via graph embeddings Zhong, Xiaoshi Kaalia, Rama Rajapakse, Jagath Chandana School of Computer Science and Engineering Science::Biological sciences Graph Embeddings Vector Representations Background: Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO terms; recently some research exploited word embeddings to learn vector representations for GO terms from a large-scale corpus. In this paper, we proposed a novel method, named GO2Vec, that exploits graph embeddings to learn vector representations for GO terms from GO graph. GO2Vec combines the information from both GO graph and GO annotations, and its learned vectors can be applied to a variety of bioinformatics applications, such as calculating functional similarity between proteins and predicting protein-protein interactions. Results: We conducted two kinds of experiments to evaluate the quality of GO2Vec: (1) functional similarity between proteins on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions on the Yeast and Human datasets from the STRING database. Experimental results demonstrate the effectiveness of GO2Vec over the information content-based measures and the word embedding-based measures. Conclusion: Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GO and GOA graphs. Our results also demonstrate that GO annotations provide useful information for computing the similarity between GO terms and between proteins. Ministry of Education (MOE) Published version Publication of this article was funded by the Tier-2 grant MOE2016-T2-1-029 from the Ministry of Education, Singapore. 2021-01-13T05:36:32Z 2021-01-13T05:36:32Z 2019 Journal Article Zhong, X., Kaalia, R., & Rajapakse, J. C. (2019). GO2Vec : transforming GO terms and proteins to vector representations via graph embeddings. BMC Genomics, 20, 918-. doi:10.1186/s12864-019-6272-2 1471-2164 0000-0002-6108-272X https://hdl.handle.net/10356/145882 10.1186/s12864-019-6272-2 31874639 2-s2.0-85077123353 20 en MOE2016-T2-1-029 BMC Genomics © 2020 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Science::Biological sciences
Graph Embeddings
Vector Representations
spellingShingle Science::Biological sciences
Graph Embeddings
Vector Representations
Zhong, Xiaoshi
Kaalia, Rama
Rajapakse, Jagath Chandana
GO2Vec : transforming GO terms and proteins to vector representations via graph embeddings
description Background: Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO terms; recently some research exploited word embeddings to learn vector representations for GO terms from a large-scale corpus. In this paper, we proposed a novel method, named GO2Vec, that exploits graph embeddings to learn vector representations for GO terms from GO graph. GO2Vec combines the information from both GO graph and GO annotations, and its learned vectors can be applied to a variety of bioinformatics applications, such as calculating functional similarity between proteins and predicting protein-protein interactions. Results: We conducted two kinds of experiments to evaluate the quality of GO2Vec: (1) functional similarity between proteins on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions on the Yeast and Human datasets from the STRING database. Experimental results demonstrate the effectiveness of GO2Vec over the information content-based measures and the word embedding-based measures. Conclusion: Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GO and GOA graphs. Our results also demonstrate that GO annotations provide useful information for computing the similarity between GO terms and between proteins.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Zhong, Xiaoshi
Kaalia, Rama
Rajapakse, Jagath Chandana
format Article
author Zhong, Xiaoshi
Kaalia, Rama
Rajapakse, Jagath Chandana
author_sort Zhong, Xiaoshi
title GO2Vec : transforming GO terms and proteins to vector representations via graph embeddings
title_short GO2Vec : transforming GO terms and proteins to vector representations via graph embeddings
title_full GO2Vec : transforming GO terms and proteins to vector representations via graph embeddings
title_fullStr GO2Vec : transforming GO terms and proteins to vector representations via graph embeddings
title_full_unstemmed GO2Vec : transforming GO terms and proteins to vector representations via graph embeddings
title_sort go2vec : transforming go terms and proteins to vector representations via graph embeddings
publishDate 2021
url https://hdl.handle.net/10356/145882
_version_ 1690658315878531072