首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.

Background  

A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated sufficiently enough to give researchers a straightforward opportunity to analyze them together in the same frame of study. Systematic application of data integration methods is also hampered by a multitude of such factors as the orthogonal nature of the integrated data and naming problems.  相似文献   

4.
Ahmad  Rasheed  Alsmadi  Izzat  Alhamdani  Wasim  Tawalbeh  Lo&#;ai 《Cluster computing》2022,25(3):2125-2141
Cluster Computing - Data analytics projects span all types of domains and applications. Researchers publish results using certain datasets and classification models. They present results with a...  相似文献   

5.
We explore the utility of p-value weighting for enhancing the power to detect differential metabolites in a two-sample setting. Related gene expression information is used to assign an a priori importance level to each metabolite being tested. We map the gene expression to a metabolite through pathways and then gene expression information is summarized per-pathway using gene set enrichment tests. Through simulation we explore four styles of enrichment tests and four weight functions to convert the gene information into a meaningful p-value weight. We implement the p-value weighting on a prostate cancer metabolomic dataset. Gene expression on matched samples is used to construct the weights. Under certain regulatory conditions, the use of weighted p-values does not inflate the type I error above what we see for the un-weighted tests except in high correlation situations. The power to detect differential metabolites is notably increased in situations with disjoint pathways and shows moderate improvement, relative to the proportion of enriched pathways, when pathway membership overlaps.  相似文献   

6.
7.
8.
Purpose

Objective uncertainty quantification (UQ) of a product life-cycle assessment (LCA) is a critical step for decision-making. Environmental impacts can be measured directly or by using models. Underlying mathematical functions describe a model that approximate the environmental impacts during various LCA stages. In this study, three possible uncertainty sources of a mathematical model, i.e., input variability, model parameter (differentiate from input in this study), and model-form uncertainties, were investigated. A simple and easy to implement method is proposed to quantify each source.

Methods

Various data analytics methods were used to conduct a thorough model uncertainty analysis; (1) Interval analysis was used for input uncertainty quantification. A direct sampling using Monte Carlo (MC) simulation was used for interval analysis, and results were compared to that of indirect nonlinear optimization as an alternative approach. A machine learning surrogate model was developed to perform direct MC sampling as well as indirect nonlinear optimization. (2) A Bayesian inference was adopted to quantify parameter uncertainty. (3) A recently introduced model correction method based on orthogonal polynomial basis functions was used to evaluate the model-form uncertainty. The methods are applied to a pavement LCA to propagate uncertainties throughout an energy and global warming potential (GWP) estimation model; a case of a pavement section in Chicago metropolitan area was used.

Results and discussion

Results indicate that each uncertainty source contributes to the overall energy and GWP output of the LCA. Input uncertainty was shown to have significant impact on overall GWP output; for the example case study, GWP interval was around 50%. Parameter uncertainty results showed that an assumption of ±?10% uniform variation in the model parameter priors resulted in 28% variation in the GWP output. Model-form uncertainty had the lowest impact (less than 10% variation in the GWP). This is because the original energy model is relatively accurate in estimating the energy. However, sensitivity of the model-form uncertainty showed that even up to 180% variation in the results can be achieved due to lower original model accuracies.

Conclusions

Investigating each uncertainty source of the model indicated the importance of the accurate characterization, propagation, and quantification of uncertainty. The outcome of this study proposed independent and relatively easy to implement methods that provide robust grounds for objective model uncertainty analysis for LCA applications. Assumptions on inputs, parameter distributions, and model form need to be justified. Input uncertainty plays a key role in overall pavement LCA output. The proposed model correction method as well as interval analysis were relatively easy to implement. Research is still needed to develop a more generic and simplified MCMC simulation procedure that is fast to implement.

  相似文献   

9.
10.
11.
12.

Background  

Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data.  相似文献   

13.
14.
Single cell analytics allows quantitative investigation of single biological cells from a structural, functional and proteomics point of view and opens possibilities to a novel unamplified cell analysis inherently insensitive to ensemble-averaging, cell-cycle or cell-population effects. We report on three different experimental methods and their application to cellular systems with single molecule sensitivity at the single cell level. Firstly, atomic force microscopy (AFM) can be used to elucidate the surface structure of living bacteria down to the nanometer scale where identification of irregular surface areas and 2D-arrays of regular protein s-layers is possible. Secondly, single cell manipulation and probing experiments with optical tweezers (OT) force spectroscopy allows quantitative identification of individual recognition events of membrane bound receptors. And thirdly, a novel, single cell analysis for protein fingerprinting in structured microfluidic device format will allow a future (label-free) on-chip electrophoretical protein separation of single cells without preamplification.  相似文献   

15.
In the integrative analyses of omics data, it is often of interest to extract data representation from one data type that best reflect its relations with another data type. This task is traditionally fulfilled by linear methods such as canonical correlation analysis (CCA) and partial least squares (PLS). However, information contained in one data type pertaining to the other data type may be complex and in nonlinear form. Deep learning provides a convenient alternative to extract low-dimensional nonlinear data embedding. In addition, the deep learning setup can naturally incorporate the effects of clinical confounding factors into the integrative analysis. Here we report a deep learning setup, named Autoencoder-based Integrative Multi-omics data Embedding (AIME), to extract data representation for omics data integrative analysis. The method can adjust for confounder variables, achieve informative data embedding, rank features in terms of their contributions, and find pairs of features from the two data types that are related to each other through the data embedding. In simulation studies, the method was highly effective in the extraction of major contributing features between data types. Using two real microRNA-gene expression datasets, one with confounder variables and one without, we show that AIME excluded the influence of confounders, and extracted biologically plausible novel information. The R package based on Keras and the TensorFlow backend is available at https://github.com/tianwei-yu/AIME.  相似文献   

16.
High-throughput technologies are now used to generate more than one type of data from the same biological samples. To properly integrate such data, we propose using co-modules, which describe coherent patterns across paired data sets, and conceive several modular methods for their identification. We first test these methods using in silico data, demonstrating that the integrative scheme of our Ping-Pong Algorithm uncovers drug-gene associations more accurately when considering noisy or complex data. Second, we provide an extensive comparative study using the gene-expression and drug-response data from the NCI-60 cell lines. Using information from the DrugBank and the Connectivity Map databases we show that the Ping-Pong Algorithm predicts drug-gene associations significantly better than other methods. Co-modules provide insights into possible mechanisms of action for a wide range of drugs and suggest new targets for therapy.  相似文献   

17.
Leukemias are exceptionally well studied at the molecular level and a wealth of high-throughput data has been published. But further utilization of these data by researchers is severely hampered by the lack of accessible integrative tools for viewing and analysis. We developed the Leukemia Gene Atlas (LGA) as a public platform designed to support research and analysis of diverse genomic data published in the field of leukemia. With respect to leukemia research, the LGA is a unique resource with comprehensive search and browse functions. It provides extensive analysis and visualization tools for various types of molecular data. Currently, its database contains data from more than 5,800 leukemia and hematopoiesis samples generated by microarray gene expression, DNA methylation, SNP and next generation sequencing analyses. The LGA allows easy retrieval of large published data sets and thus helps to avoid redundant investigations. It is accessible at www.leukemia-gene-atlas.org.  相似文献   

18.
Paired-end sequencing is a common approach for identifying structural variation (SV) in genomes. Discrepancies between the observed and expected alignments indicate potential SVs. Most SV detection algorithms use only one of the possible signals and ignore reads with multiple alignments. This results in reduced sensitivity to detect SVs, especially in repetitive regions. We introduce GASVPro, an algorithm combining both paired read and read depth signals into a probabilistic model which can analyze multiple alignments of reads. GASVPro outperforms existing methods with a 50-90% improvement in specificity on deletions and a 50% improvement on inversions.  相似文献   

19.
Analytical testing of product quality attributes and process parameters during the biologics development (Process analytics) has been challenging due to the rapid growth of biomolecules with complex modalities to support unmet therapeutic needs. Thus, the expansion of the process analytics tool box for rapid analytics with the deployment of cutting-edge technologies and cyber-physical systems is a necessity. We introduce the term, Process Analytics 4.0; which entails not only technology aspects such as process analytical technology (PAT), assay automation, and high-throughput analytics, but also cyber-physical systems that enable data management, visualization, augmented reality, and internet of things (IoT) infrastructure for real time analytics in process development environment. This review is exclusively focused on dissecting high-level features of PAT, automation, and data management with some insights into the business aspects of implementing during process analytical testing in biologics process development. Significant technological and business advantages can be gained with the implementation of digitalization, automation, and real time testing. A systematic development and employment of PAT in process development workflows enable real time analytics for better process understanding, agility, and sustainability. Robotics and liquid handling workstations allow rapid assay and sample preparation automation to facilitate high-throughput testing of attributes and molecular properties which are otherwise challenging to monitor with PAT tools due to technological and business constraints. Cyber-physical systems for data management, visualization, and repository must be established as part of Process Analytics 4.0 framework. Furthermore, we review some of the challenges in implementing these technologies based on our expertise in process analytics for biopharmaceutical drug substance development.  相似文献   

20.
AIMS: To develop molecular tools and examine inducible and constitutive gene expression in Thermus thermophilus. METHODS AND RESULTS: Two plasmid promoter probe vectors and an integrative promoter probe vector were constructed using a promoterless thermostable kanamycin nucleotidyltransferase (KmR) cassette. Three expression vectors were constructed based on a constitutive promoter J17, that functions in both Thermus and Escherichia coli. An inducible expression vector was constructed using the heat-shock inducible promoter (70 to 85 degrees C) from the dnaK gene of T. flavus, and the malate dehydrogenase gene (mdh) from T. flavus was cloned and expressed in both E. coli and T. thermophilus HB27. CONCLUSION: This report describes the construction and use of improved promoter probe and expression vectors for use in Thermus species. The mdh gene can be used as a high temperature (85 degrees C) reporter gene for Thermus sp. The dnaK promoter is thermo-inducible. Significance and Impact of the Study: The expression vectors and molecular tools described here are significant improvements over previously reported vectors for Thermus sp. The mdh gene and the thermo-inducible dnaK promoter will facilitate high temperature studies employing Thermus species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号