首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The problem of reconstructing large-scale, gene regulatory networks from gene expression data has garnered considerable attention in bioinformatics over the past decade with the graphical modeling paradigm having emerged as a popular framework for inference. Analysis in a full Bayesian setting is contingent upon the assignment of a so-called structure prior-a probability distribution on networks, encoding a priori biological knowledge either in the form of supplemental data or high-level topological features. A key topological consideration is that a wide range of cellular networks are approximately scale-free, meaning that the fraction, , of nodes in a network with degree is roughly described by a power-law with exponent between and . The standard practice, however, is to utilize a random structure prior, which favors networks with binomially distributed degree distributions. In this paper, we introduce a scale-free structure prior for graphical models based on the formula for the probability of a network under a simple scale-free network model. Unlike the random structure prior, its scale-free counterpart requires a node labeling as a parameter. In order to use this prior for large-scale network inference, we design a novel Metropolis-Hastings sampler for graphical models that includes a node labeling as a state space variable. In a simulation study, we demonstrate that the scale-free structure prior outperforms the random structure prior at recovering scale-free networks while at the same time retains the ability to recover random networks. We then estimate a gene association network from gene expression data taken from a breast cancer tumor study, showing that scale-free structure prior recovers hubs, including the previously unknown hub SLC39A6, which is a zinc transporter that has been implicated with the spread of breast cancer to the lymph nodes. Our analysis of the breast cancer expression data underscores the value of the scale-free structure prior as an instrument to aid in the identification of candidate hub genes with the potential to direct the hypotheses of molecular biologists, and thus drive future experiments.  相似文献   

2.
3.
4.
Network representations of biological systems are widespread and reconstructing unknown networks from data is a focal problem for computational biologists. For example, the series of biochemical reactions in a metabolic pathway can be represented as a network, with nodes corresponding to metabolites and edges linking reactants to products. In a different context, regulatory relationships among genes are commonly represented as directed networks with edges pointing from influential genes to their targets. Reconstructing such networks from data is a challenging problem receiving much attention in the literature. There is a particular need for approaches tailored to time-series data and not reliant on direct intervention experiments, as the former are often more readily available. In this paper, we introduce an approach to reconstructing directed networks based on dynamic systems models. Our approach generalizes commonly used ODE models based on linear or nonlinear dynamics by extending the functional class for the functions involved from parametric to nonparametric models. Concomitantly we limit the complexity by imposing an additive structure on the estimated slope functions. Thus the submodel associated with each node is a sum of univariate functions. These univariate component functions form the basis for a novel coupling metric that we define in order to quantify the strength of proposed relationships and hence rank potential edges. We show the utility of the method by reconstructing networks using simulated data from computational models for the glycolytic pathway of Lactocaccus Lactis and a gene network regulating the pluripotency of mouse embryonic stem cells. For purposes of comparison, we also assess reconstruction performance using gene networks from the DREAM challenges. We compare our method to those that similarly rely on dynamic systems models and use the results to attempt to disentangle the distinct roles of linearity, sparsity, and derivative estimation.  相似文献   

5.
6.
Liu B  de la Fuente A  Hoeschele I 《Genetics》2008,178(3):1763-1776
Our goal is gene network inference in genetical genomics or systems genetics experiments. For species where sequence information is available, we first perform expression quantitative trait locus (eQTL) mapping by jointly utilizing cis-, cis-trans-, and trans-regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTL and directed edges from eQTL to cis-regulated target genes, from cis-regulated genes to cis-trans-regulated target genes, from trans-regulator genes to target genes, and from trans-eQTL to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose structural equation modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. On the basis of a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.  相似文献   

7.

Background

Network inference deals with the reconstruction of molecular networks from experimental data. Given N molecular species, the challenge is to find the underlying network. Due to data limitations, this typically is an ill-posed problem, and requires the integration of prior biological knowledge or strong regularization. We here focus on the situation when time-resolved measurements of a system’s response after systematic perturbations are available.

Results

We present a novel method to infer signaling networks from time-course perturbation data. We utilize dynamic Bayesian networks with probabilistic Boolean threshold functions to describe protein activation. The model posterior distribution is analyzed using evolutionary MCMC sampling and subsequent clustering, resulting in probability distributions over alternative networks. We evaluate our method on simulated data, and study its performance with respect to data set size and levels of noise. We then use our method to study EGF-mediated signaling in the ERBB pathway.

Conclusions

Dynamic Probabilistic Threshold Networks is a new method to infer signaling networks from time-series perturbation data. It exploits the dynamic response of a system after external perturbation for network reconstruction. On simulated data, we show that the approach outperforms current state of the art methods. On the ERBB data, our approach recovers a significant fraction of the known interactions, and predicts novel mechanisms in the ERBB pathway.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-250) contains supplementary material, which is available to authorized users.  相似文献   

8.
9.

Background

Candidate gene prioritization aims to identify promising new genes associated with a disease or a biological process from a larger set of candidate genes. In recent years, network-based methods – which utilize a knowledge network derived from biological knowledge – have been utilized for gene prioritization. Biological knowledge can be encoded either through the network''s links or nodes. Current network-based methods can only encode knowledge through links. This paper describes a new network-based method that can encode knowledge in links as well as in nodes.

Results

We developed a new network inference algorithm called the Knowledge Network Gene Prioritization (KNGP) algorithm which can incorporate both link and node knowledge. The performance of the KNGP algorithm was evaluated on both synthetic networks and on networks incorporating biological knowledge. The results showed that the combination of link knowledge and node knowledge provided a significant benefit across 19 experimental diseases over using link knowledge alone or node knowledge alone.

Conclusions

The KNGP algorithm provides an advance over current network-based algorithms, because the algorithm can encode both link and node knowledge. We hope the algorithm will aid researchers with gene prioritization.  相似文献   

10.
Cross-referencing experimental data with our current knowledge of signaling network topologies is one central goal of mathematical modeling of cellular signal transduction networks. We present a new methodology for data-driven interrogation and training of signaling networks. While most published methods for signaling network inference operate on Bayesian, Boolean, or ODE models, our approach uses integer linear programming (ILP) on interaction graphs to encode constraints on the qualitative behavior of the nodes. These constraints are posed by the network topology and their formulation as ILP allows us to predict the possible qualitative changes (up, down, no effect) of the activation levels of the nodes for a given stimulus. We provide four basic operations to detect and remove inconsistencies between measurements and predicted behavior: (i) find a topology-consistent explanation for responses of signaling nodes measured in a stimulus-response experiment (if none exists, find the closest explanation); (ii) determine a minimal set of nodes that need to be corrected to make an inconsistent scenario consistent; (iii) determine the optimal subgraph of the given network topology which can best reflect measurements from a set of experimental scenarios; (iv) find possibly missing edges that would improve the consistency of the graph with respect to a set of experimental scenarios the most. We demonstrate the applicability of the proposed approach by interrogating a manually curated interaction graph model of EGFR/ErbB signaling against a library of high-throughput phosphoproteomic data measured in primary hepatocytes. Our methods detect interactions that are likely to be inactive in hepatocytes and provide suggestions for new interactions that, if included, would significantly improve the goodness of fit. Our framework is highly flexible and the underlying model requires only easily accessible biological knowledge. All related algorithms were implemented in a freely available toolbox SigNetTrainer making it an appealing approach for various applications.  相似文献   

11.
Prolonged high-fat diet leads to the development of obesity and multiple comorbidities including non-alcoholic steatohepatitis (NASH), but the underlying molecular basis is not fully understood. We combine molecular networks and time course gene expression profiles to reveal the dynamic changes in molecular networks underlying diet-induced obesity and NASH. We also identify hub genes associated with the development of NASH. Core diet-induced obesity networks were constructed using Ingenuity pathway analysis (IPA) based on 332 high-fat diet responsive genes identified in liver by time course microarray analysis (8 time points over 24 weeks) of high-fat diet-fed mice compared to normal diet-fed mice. IPA identified five core diet-induced obesity networks with time-dependent gene expression changes in liver. These networks were associated with cell-to-cell signaling and interaction (Network 1), lipid metabolism (Network 2), hepatic system disease (Network 3 and 5), and inflammatory response (Network 4). When we merged these core diet-induced obesity networks, Tlr2, Cd14, and Ccnd1 emerged as hub genes associated with both liver steatosis and inflammation and were altered in a time-dependent manner. Further, protein–protein interaction network analysis revealed Tlr2, Cd14, and Ccnd1 were interrelated through the ErbB/insulin signaling pathway. Dynamic changes occur in molecular networks underlying diet-induced obesity. Tlr2, Cd14, and Ccnd1 appear to be hub genes integrating molecular interactions associated with the development of NASH. Therapeutics targeting hub genes and core diet-induced obesity networks may help ameliorate diet-induced obesity and NASH.  相似文献   

12.
MOTIVATION: Bayesian network methods have shown promise in gene regulatory network reconstruction because of their capability of capturing causal relationships between genes and handling data with noises found in biological experiments. The problem of learning network structures, however, is NP hard. Consequently, heuristic methods such as hill climbing are used for structure learning. For networks of a moderate size, hill climbing methods are not computationally efficient. Furthermore, relatively low accuracy of the learned structures may be observed. The purpose of this article is to present a novel structure learning method for gene network discovery. RESULTS: In this paper, we present a novel structure learning method to reconstruct the underlying gene networks from the observational gene expression data. Unlike hill climbing approaches, the proposed method first constructs an undirected network based on mutual information between two nodes and then splits the structure into substructures. The directional orientations for the edges that connect two nodes are then obtained by optimizing a scoring function for each substructure. Our method is evaluated using two benchmark network datasets with known structures. The results show that the proposed method can identify networks that are close to the optimal structures. It outperforms hill climbing methods in terms of both computation time and predicted structure accuracy. We also apply the method to gene expression data measured during the yeast cycle and show the effectiveness of the proposed method for network reconstruction.  相似文献   

13.
Structural brain networks may be reconstructed from diffusion MRI tractography data and have great potential to further our understanding of the topological organisation of brain structure in health and disease. Network reconstruction is complex and involves a series of processesing methods including anatomical parcellation, registration, fiber orientation estimation and whole-brain fiber tractography. Methodological choices at each stage can affect the anatomical accuracy and graph theoretical properties of the reconstructed networks, meaning applying different combinations in a network reconstruction pipeline may produce substantially different networks. Furthermore, the choice of which connections are considered important is unclear. In this study, we assessed the similarity between structural networks obtained using two independent state-of-the-art reconstruction pipelines. We aimed to quantify network similarity and identify the core connections emerging most robustly in both pipelines. Similarity of network connections was compared between pipelines employing different atlases by merging parcels to a common and equivalent node scale. We found a high agreement between the networks across a range of fiber density thresholds. In addition, we identified a robust core of highly connected regions coinciding with a peak in similarity across network density thresholds, and replicated these results with atlases at different node scales. The binary network properties of these core connections were similar between pipelines but showed some differences in atlases across node scales. This study demonstrates the utility of applying multiple structural network reconstrution pipelines to diffusion data in order to identify the most important connections for further study.  相似文献   

14.
TH Chueh  HH Lu 《PloS one》2012,7(8):e42095
One great challenge of genomic research is to efficiently and accurately identify complex gene regulatory networks. The development of high-throughput technologies provides numerous experimental data such as DNA sequences, protein sequence, and RNA expression profiles makes it possible to study interactions and regulations among genes or other substance in an organism. However, it is crucial to make inference of genetic regulatory networks from gene expression profiles and protein interaction data for systems biology. This study will develop a new approach to reconstruct time delay Boolean networks as a tool for exploring biological pathways. In the inference strategy, we will compare all pairs of input genes in those basic relationships by their corresponding [Formula: see text]-scores for every output gene. Then, we will combine those consistent relationships to reveal the most probable relationship and reconstruct the genetic network. Specifically, we will prove that [Formula: see text] state transition pairs are sufficient and necessary to reconstruct the time delay Boolean network of [Formula: see text] nodes with high accuracy if the number of input genes to each gene is bounded. We also have implemented this method on simulated and empirical yeast gene expression data sets. The test results show that this proposed method is extensible for realistic networks.  相似文献   

15.
Clinical treatment outcomes are the quality and cost targets that health-care providers aim to improve. Most existing outcome analysis focuses on a single disease or all diseases combined. Motivated by the success of molecular and phenotypic human disease networks (HDNs), this article develops a clinical treatment network that describes the interconnections among diseases in terms of inpatient length of stay (LOS) and readmission. Here one node represents one disease, and two nodes are linked with an edge if their LOS and number of readmissions are conditionally dependent. This is the very first HDN that jointly analyzes multiple clinical treatment outcomes at the pan-disease level. To accommodate the unique data characteristics, we propose a modeling approach based on two-part generalized linear models and estimation based on penalized integrative analysis. Analysis is conducted on the Medicare inpatient data of 100,000 randomly selected subjects for the period of January 2010 to December 2018. The resulted network has 1008 edges for 106 nodes. We analyze key network properties including connectivity, module/hub, and temporal variation. The findings are biomedically sensible. For example, high connectivity and hub conditions, such as disorders of lipid metabolism and essential hypertension, are identified. There are also findings that are less/not investigated in the literature. Overall, this study can provide additional insight into diseases' properties and their interconnections and assist more efficient disease management and health-care resources allocation.  相似文献   

16.
17.
18.
19.
The emergence of HIV-TB co-infection and multi-drug resistant strains of Mycobacterium tuberculosis (Mtb) drive the need for new therapeutics against the infectious disease tuberculosis. Among the reported putative TB targets in the literature, the identification and characterization of the most probable therapeutic targets that influence the complex infectious disease, primarily through interactions with other influenced proteins, remains a statistical and computational challenge in proteomic epidemiology. Protein interaction network analysis provides an effective way to understand the relationships between protein products of genes by interconnecting networks of essential genes and its protein-protein interactions for 5 broad functional categories in Mtb. We also investigated the substructure of the protein interaction network and focused on highly connected nodes known as cliques by giving weight to the edges using data mining algorithms. Cliques containing Sulphate assimilation and Shikimate pathway enzymes appeared continuously inspite of increasing constraints applied by the K-Core algorithm during Network Decomposition. The potential target narrowed down through Systems approaches was Prephanate Dehydratase present in the Shikimate pathway this gives an insight to develop novel potential inhibitors through Structure Based Drug Design with natural compounds.  相似文献   

20.
Genetic networks can characterize complex genetic relationships among groups of individuals, which can be used to rank nodes most important to the overall connectivity of the system. Ranking allows scarce resources to be guided toward nodes integral to connectivity. The greater sage‐grouse (Centrocercus urophasianus) is a species of conservation concern that breeds on spatially discrete leks that must remain connected by genetic exchange for population persistence. We genotyped 5,950 individuals from 1,200 greater sage‐grouse leks distributed across the entire species’ geographic range. We found a small‐world network composed of 458 nodes connected by 14,481 edges. This network was composed of hubs—that is, nodes facilitating gene flow across the network—and spokes—that is, nodes where connectivity is served by hubs. It is within these hubs that the greatest genetic diversity was housed. Using indices of network centrality, we identified hub nodes of greatest conservation importance. We also identified keystone nodes with elevated centrality despite low local population size. Hub and keystone nodes were found across the entire species’ contiguous range, although nodes with elevated importance to network‐wide connectivity were found more central: especially in northeastern, central, and southwestern Wyoming and eastern Idaho. Nodes among which genes are most readily exchanged were mostly located in Montana and northern Wyoming, as well as Utah and eastern Nevada. The loss of hub or keystone nodes could lead to the disintegration of the network into smaller, isolated subnetworks. Protecting both hub nodes and keystone nodes will conserve genetic diversity and should maintain network connections to ensure a resilient and viable population over time. Our analysis shows that network models can be used to model gene flow, offering insights into its pattern and process, with application to prioritizing landscapes for conservation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号