Building and validating gene regulatory networks (GRN) with delays from multiple data sources
Cell functions are highly complex as they involve a concerted activity of many genes and their products (proteins). Each cell activity is typically coordinated by the organization of the genome into sets of genes that co-regulate in concert and share common functions (regulatory modules), common rol...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/65383 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Cell functions are highly complex as they involve a concerted activity of many genes and their products (proteins). Each cell activity is typically coordinated by the organization of the genome into sets of genes that co-regulate in concert and share common functions (regulatory modules), common roles (processes), or interactions together in pathways. Gene Regulatory Networks (GRN) organize these related cell activities coherently into a concise network representation. Regulatory interactions are inherently asynchronous and transpire at different times. GRNs with time delays are therefore represent the dynamic regulatory interactions reasonably well, which can be inferred from gene expressions (GE) time-series. Although genome-wide expression profiles provide important information about the cellular processes, due to the paucity of information available in the individual data sources, GRNs are better learned by using multiple types of data sources combined with GE data. In this thesis, we proposed several novel methods to infer and validate GRNs including delays from multiple data sources such as gene ontology databases, literature, GE time-series data, and quantitative protein-protein interaction networks (PPIN).
Gene Ontology (GO) provides well-structured hierarchical vocabularies of biological processes and functional annotations of genes. It is commonly assumed that gene regulations/interactions occur among functionally associated genes. GO enrichment has been widely used to evaluate the functional coherence of a gene list, but has not been incorporated into evaluating similarity between genes. We incorporated GO enrichment to improve the functional similarities of genes and used them to extract functionally associated pathways. Most of previous research predicted gene interactions from GO functional similarities of genes by considering the hierarchical relations of GO terms, without/rarely considering their biological meaning, for example, the “regulate” relation. We proposed a novel method to infer gene regulations by considering the biological clues of GO “regulate” relation. Transitive regulations were also detected by applying transitive rules of GO relations. To improve the accuracy and biological feasibility of inferred gene regulation, we restricted the transitive regulatory path method within a functionally associated pathway to detect regulatory pathways and construct GRNs. The benchmark networks included five functionally associated networks/pathways and five randomly chosen networks from E.coli network in RegulonDB and yeast cell-cycle networks in YEASTRACT database. Experimental results showed that our proposed functional similarity score outperformed both previous functional similarity score and the integrated functional association score from STRING on recovering almost all benchmark networks. It was also experimentally shown that our regulatory path method beat the state-of-the-art GNR inference methods from gene expression time-series data, dynamic Bayesian network (DBN), GeneReg and LASSO, which used graphical or linear regression models to predict various delays, on all the networks. Compared with the randomly selected networks, our regulatory path method performed much better on functionally associated pathways. This demonstrates that restricting the transitive regulatory pathway inference within functionally associated pathway is both useful for inferring GRN and biological plausible.
Time delays are important factors often neglected in gene regulatory network (GRN) inference models. As state-of-the-art method, DBNs have been widely used for modeling dynamics of gene regulation.
Previous works have extended DBN to higher-order DBN (HDBN) so that the order can represent the delays. We adopted variable-order DBN (VDBN), which presumes each gene has its own delay. Thus biologically more plausible GRN can be modeled with time-delays by using VDBN. As the order of DBN is increased, the number of parameters in the model increases exponentially and the search space becomes exponentially large. We used GlobalMIT+, a polynomial algorithm to reduce the complexity of learning the VDBN structure. The optimal time delay for each gene was learned by the variable-order Monte Carlo Markov Chain (MCMC) scheme. We also explored the introduction of appropriate priors to improve the accuracy and reduce the complexity of learning DBNs. Since protein interactions and functional associations provide some evidence for gene regulation, we incorporated information of these priors from STRING or GO into the process of GRN inference via VDBN from GE time series data. We used a Bayesian fusion framework to take the quantitative priors into account. The parameters for weighting the priors were automatically learned from their consistency with the inferred GRN from GE data. Experimental results showed that VDBN significantly outperformed DBN and HDBN with fixed orders. The prior information significantly improved the performance of all DBNs. Many predicted time-delayed regulations were validated by the literature. This demonstrates that our fusion method quantitatively captures the coherent genes’ dependency relations from multiple data sources, which is both effective and biologically reasonable.
Validating time delays from knowledge bases is a challenge since the vast majority of biological databases do not record temporal information of gene regulations. Biological knowledge and facts on gene regulations are typically extracted from bio-literature with specialized methods that depend on the regulation task. We mined evidence for time delays related to the transcriptional regulation of yeast from the PubMed abstracts. Since the vast majority of abstracts lack quantitative time information, we can only collect qualitative evidence of time delays. Specifically, the speed-up or delay in transcriptional regulation rate can provide evidence for time delays (shorter or longer) in GRN. Thus, we focused on deriving events related to rate changes in transcriptional regulation. A corpus of yeast regulation related abstracts was manually labelled with such events. In order to capture these events automatically, we created an ontology of sub-processes that are likely to result in transcription rate changes by combining textual patterns and biological knowledge. We also proposed effective feature extraction methods based on the created ontology to identify the direct evidence with specific details of the events. Our ontologies outperformed existing state-of-the-art gene regulation ontologies to identify rate-changing events. Experimental results showed that the machine learning method on our proposed features achieved an F1-score of 71.43% on identifying the direct evidence of these events. This demonstrates the effectiveness of our methods on deriving delayed gene regulations from bio-literature.
In summary, our research has resulted in three major contributions: extracting evidence for transitive regulation from structured biological database--GO; deriving delayed transcriptional regulation from unformatted data source--biological literature; and fusing multiple data sources for a time-delayed GRN. We believe that these contributions will lead to more accurate and biological plausible models of dynamics in gene regulation in silico. The established GRNs provide deeper insight of biological processes and will be well applied in life science, such as disease analysis, clinical studies, etc. |
---|