首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene set enrichment analysis (GSEA) aims at identifying essential pathways, or more generally, sets of biologically related genes that are involved in complex human diseases. In the past, many studies have shown that GSEA is a very useful bioinformatics tool that plays critical roles in the innovation of disease prevention and intervention strategies. Despite its tremendous success, it is striking that conclusions of GSEA drawn from isolated studies are often sparse, and different studies may lead to inconsistent and sometimes contradictory results. Further, in the wake of next generation sequencing technologies, it has been made possible to measure genome‐wide isoform‐specific expression levels, calling for innovations that can utilize the unprecedented resolution. Currently, enormous amounts of data have been created from various RNA‐seq experiments. All these give rise to a pressing need for developing integrative methods that allow for explicit utilization of isoform‐specific expression, to combine multiple enrichment studies, in order to enhance the power, reproducibility, and interpretability of the analysis. We develop and evaluate integrative GSEA methods, based on two‐stage procedures, which, for the first time, allow statistically efficient use of isoform‐specific expression from multiple RNA‐seq experiments. Through simulation and real data analysis, we show that our methods can greatly improve the performance in identifying essential gene sets compared to existing methods that can only use gene‐level expression.  相似文献   

2.
Next‐generation sequencing technologies have afforded unprecedented characterization of low‐frequency and rare genetic variation. Due to low power for single‐variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel‐machine regression and adaptive testing methods for aggregative rare‐variant association testing have been demonstrated to be powerful approaches for pathway‐level analysis, although these methods tend to be computationally intensive at high‐variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare‐variant analysis using component gene‐level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family‐wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case‐control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open‐source R code for public use to facilitate easy application of our methods to existing rare‐variant analysis results.  相似文献   

3.
We study the problem of testing for single marker‐multiple phenotype associations based on genome‐wide association study (GWAS) summary statistics without access to individual‐level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual‐level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta‐analyzed GWAS dataset with three blood lipid traits and another with sex‐stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta‐analyzed) genome‐wide summary statistics, then extend the method to meta‐analysis of multiple sets of genome‐wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.  相似文献   

4.
Meta‐analysis of genome‐wide association studies (GWAS) has achieved great success in detecting loci underlying human diseases. Incorporating GWAS results from diverse ethnic populations for meta‐analysis, however, remains challenging because of the possible heterogeneity across studies. Conventional fixed‐effects (FE) or random‐effects (RE) methods may not be most suitable to aggregate multiethnic GWAS results because of violation of the homogeneous effect assumption across studies (FE) or low power to detect signals (RE). Three recently proposed methods, modified RE (RE‐HE) model, binary‐effects (BE) model and a Bayesian approach (Meta‐analysis of Transethnic Association [MANTRA]), show increased power over FE and RE methods while incorporating heterogeneity of effects when meta‐analyzing trans‐ethnic GWAS results. We propose a two‐stage approach to account for heterogeneity in trans‐ethnic meta‐analysis in which we clustered studies with cohort‐specific ancestry information prior to meta‐analysis. We compare this to a no‐prior‐clustering (crude) approach, evaluating type I error and power of these two strategies, in an extensive simulation study to investigate whether the two‐stage approach offers any improvements over the crude approach. We find that the two‐stage approach and the crude approach for all five methods (FE, RE, RE‐HE, BE, MANTRA) provide well‐controlled type I error. However, the two‐stage approach shows increased power for BE and RE‐HE, and similar power for MANTRA and FE compared to their corresponding crude approach, especially when there is heterogeneity across the multiethnic GWAS results. These results suggest that prior clustering in the two‐stage approach can be an effective and efficient intermediate step in meta‐analysis to account for the multiethnic heterogeneity.  相似文献   

5.
With varying, but substantial, proportions of heritability remaining unexplained by summaries of single‐SNP genetic variation, there is a demand for methods that extract maximal information from genetic association studies. One source of variation that is difficult to assess is genetic interactions. A major challenge for naive detection methods is the large number of possible combinations, with a requisite need to correct for multiple testing. Assumptions of large marginal effects, to reduce the search space, may be restrictive and miss higher order interactions with modest marginal effects. In this paper, we propose a new procedure for detecting gene‐by‐gene interactions through heterogeneity in estimated low‐order (e.g., marginal) effect sizes by leveraging population structure, or ancestral differences, among studies in which the same phenotypes were measured. We implement this approach in a meta‐analytic framework, which offers numerous advantages, such as robustness and computational efficiency, and is necessary when data‐sharing limitations restrict joint analysis. We effectively apply a dimension reduction procedure that scales to allow searches for higher order interactions. For comparison to our method, which we term phylogenY‐aware Effect‐size Tests for Interactions (YETI), we adapt an existing method that assumes interacting loci will exhibit strong marginal effects to our meta‐analytic framework. As expected, YETI excels when multiple studies are from highly differentiated populations and maintains its superiority in these conditions even when marginal effects are small. When these conditions are less extreme, the advantage of our method wanes. We assess the Type‐I error and power characteristics of complementary approaches to evaluate their strengths and limitations.  相似文献   

6.
Kernel machine (KM) models are a powerful tool for exploring associations between sets of genetic variants and complex traits. Although most KM methods use a single kernel function to assess the marginal effect of a variable set, KM analyses involving multiple kernels have become increasingly popular. Multikernel analysis allows researchers to study more complex problems, such as assessing gene‐gene or gene‐environment interactions, incorporating variance‐component based methods for population substructure into rare‐variant association testing, and assessing the conditional effects of a variable set adjusting for other variable sets. The KM framework is robust, powerful, and provides efficient dimension reduction for multifactor analyses, but requires the estimation of high dimensional nuisance parameters. Traditional estimation techniques, including regularization and the “expectation‐maximization (EM)” algorithm, have a large computational cost and are not scalable to large sample sizes needed for rare variant analysis. Therefore, under the context of gene‐environment interaction, we propose a computationally efficient and statistically rigorous “fastKM” algorithm for multikernel analysis that is based on a low‐rank approximation to the nuisance effect kernel matrices. Our algorithm is applicable to various trait types (e.g., continuous, binary, and survival traits) and can be implemented using any existing single‐kernel analysis software. Through extensive simulation studies, we show that our algorithm has similar performance to an EM‐based KM approach for quantitative traits while running much faster. We also apply our method to the Vitamin Intervention for Stroke Prevention (VISP) clinical trial, examining gene‐by‐vitamin effects on recurrent stroke risk and gene‐by‐age effects on change in homocysteine level.  相似文献   

7.
For complex traits, most associated single nucleotide variants (SNV) discovered to date have a small effect, and detection of association is only possible with large sample sizes. Because of patient confidentiality concerns, it is often not possible to pool genetic data from multiple cohorts, and meta‐analysis has emerged as the method of choice to combine results from multiple studies. Many meta‐analysis methods are available for single SNV analyses. As new approaches allow the capture of low frequency and rare genetic variation, it is of interest to jointly consider multiple variants to improve power. However, for the analysis of haplotypes formed by multiple SNVs, meta‐analysis remains a challenge, because different haplotypes may be observed across studies. We propose a two‐stage meta‐analysis approach to combine haplotype analysis results. In the first stage, each cohort estimate haplotype effect sizes in a regression framework, accounting for relatedness among observations if appropriate. For the second stage, we use a multivariate generalized least square meta‐analysis approach to combine haplotype effect estimates from multiple cohorts. Haplotype‐specific association tests and a global test of independence between haplotypes and traits are obtained within our framework. We demonstrate through simulation studies that we control the type‐I error rate, and our approach is more powerful than inverse variance weighted meta‐analysis of single SNV analysis when haplotype effects are present. We replicate a published haplotype association between fasting glucose‐associated locus (G6PC2) and fasting glucose in seven studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium and we provide more precise haplotype effect estimates.  相似文献   

8.
Meta‐analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta‐analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case–control ratios. Here, we investigate the power loss problem by the standard meta‐analysis methods for unbalanced studies, and further propose novel meta‐analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta‐score‐statistics that can accurately approximate the joint‐score‐statistics with combined individual‐level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene‐level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene‐level tests with 26 unbalanced studies of age‐related macular degeneration . In addition, we took the meta‐analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta‐analyzing multi‐ethnic samples. In summary, our improved meta‐score‐statistics with corrections for population stratification can be used to construct both single‐variant and gene‐level association studies, providing a useful framework for ensuring well‐powered, convenient, cross‐study analyses.  相似文献   

9.
Rich meta‐epidemiological data sets have been collected to explore associations between intervention effect estimates and study‐level characteristics. Welton et al proposed models for the analysis of meta‐epidemiological data, but these models are restrictive because they force heterogeneity among studies with a particular characteristic to be at least as large as that among studies without the characteristic. In this paper we present alternative models that are invariant to the labels defining the 2 categories of studies. To exemplify the methods, we use a collection of meta‐analyses in which the Cochrane Risk of Bias tool has been implemented. We first investigate the influence of small trial sample sizes (less than 100 participants), before investigating the influence of multiple methodological flaws (inadequate or unclear sequence generation, allocation concealment, and blinding). We fit both the Welton et al model and our proposed label‐invariant model and compare the results. Estimates of mean bias associated with the trial characteristics and of between‐trial variances are not very sensitive to the choice of model. Results from fitting a univariable model show that heterogeneity variance is, on average, 88% greater among trials with less than 100 participants. On the basis of a multivariable model, heterogeneity variance is, on average, 25% greater among trials with inadequate/unclear sequence generation, 51% greater among trials with inadequate/unclear blinding, and 23% lower among trials with inadequate/unclear allocation concealment, although the 95% intervals for these ratios are very wide. Our proposed label‐invariant models for meta‐epidemiological data analysis facilitate investigations of between‐study heterogeneity attributable to certain study characteristics.  相似文献   

10.
With challenges in data harmonization and environmental heterogeneity across various data sources, meta‐analysis of gene–environment interaction studies can often involve subtle statistical issues. In this paper, we study the effect of environmental covariate heterogeneity (within and between cohorts) on two approaches for fixed‐effect meta‐analysis: the standard inverse‐variance weighted meta‐analysis and a meta‐regression approach. Akin to the results in Simmonds and Higgins ( 2007 ), we obtain analytic efficiency results for both methods under certain assumptions. The relative efficiency of the two methods depends on the ratio of within versus between cohort variability of the environmental covariate. We propose to use an adaptively weighted estimator (AWE), between meta‐analysis and meta‐regression, for the interaction parameter. The AWE retains full efficiency of the joint analysis using individual level data under certain natural assumptions. Lin and Zeng (2010a, b) showed that a multivariate inverse‐variance weighted estimator retains full efficiency as joint analysis using individual level data, if the estimates with full covariance matrices for all the common parameters are pooled across all studies. We show consistency of our work with Lin and Zeng (2010a, b). Without sacrificing much efficiency, the AWE uses only univariate summary statistics from each study, and bypasses issues with sharing individual level data or full covariance matrices across studies. We compare the performance of the methods both analytically and numerically. The methods are illustrated through meta‐analysis of interaction between Single Nucleotide Polymorphisms in FTO gene and body mass index on high‐density lipoprotein cholesterol data from a set of eight studies of type 2 diabetes.  相似文献   

11.
Genome‐wide association studies are proven tools for finding disease genes, but it is often necessary to combine many cohorts into a meta‐analysis to detect statistically significant genetic effects. Often the component studies are performed by different investigators on different populations, using different chips with minimal SNPs overlap. In some cases, raw data are not available for imputation so that only the genotyped single nucleotide polymorphisms (SNPs) results can be used in meta‐analysis. Even when SNP sets are comparable, different cohorts may have peak association signals at different SNPs within the same gene due to population differences in linkage disequilibrium or environmental interactions. We hypothesize that the power to detect statistical signals in these situations will improve by using a method that simultaneously meta‐analyzes and smooths the signal over nearby markers. In this study, we propose regionally smoothed meta‐analysis methods and compare their performance on real and simulated data.  相似文献   

12.
Multiple papers have studied the use of gene‐environment (GE) independence to enhance power for testing gene‐environment interaction in case‐control studies. However, studies that evaluate the role of GE independence in a meta‐analysis framework are limited. In this paper, we extend the single‐study empirical Bayes type shrinkage estimators proposed by Mukherjee and Chatterjee (2008) to a meta‐analysis setting that adjusts for uncertainty regarding the assumption of GE independence across studies. We use the retrospective likelihood framework to derive an adaptive combination of estimators obtained under the constrained model (assuming GE independence) and unconstrained model (without assumptions of GE independence) with weights determined by measures of GE association derived from multiple studies. Our simulation studies indicate that this newly proposed estimator has improved average performance across different simulation scenarios than the standard alternative of using inverse variance (covariance) weighted estimators that combines study‐specific constrained, unconstrained, or empirical Bayes estimators. The results are illustrated by meta‐analyzing 6 different studies of type 2 diabetes investigating interactions between genetic markers on the obesity related FTO gene and environmental factors body mass index and age.  相似文献   

13.
Statistical inference for analyzing the results from several independent studies on the same quantity of interest has been investigated frequently in recent decades. Typically, any meta‐analytic inference requires that the quantity of interest is available from each study together with an estimate of its variability. The current work is motivated by a meta‐analysis on comparing two treatments (thoracoscopic and open) of congenital lung malformations in young children. Quantities of interest include continuous end‐points such as length of operation or number of chest tube days. As studies only report mean values (and no standard errors or confidence intervals), the question arises how meta‐analytic inference can be developed. We suggest two methods to estimate study‐specific variances in such a meta‐analysis, where only sample means and sample sizes are available in the treatment arms. A general likelihood ratio test is derived for testing equality of variances in two groups. By means of simulation studies, the bias and estimated standard error of the overall mean difference from both methodologies are evaluated and compared with two existing approaches: complete study analysis only and partial variance information. The performance of the test is evaluated in terms of type I error. Additionally, we illustrate these methods in the meta‐analysis on comparing thoracoscopic and open surgery for congenital lung malformations and in a meta‐analysis on the change in renal function after kidney donation. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

14.
A prognostic factor is any measure that is associated with the risk of future health outcomes in those with existing disease. Often, the prognostic ability of a factor is evaluated in multiple studies. However, meta‐analysis is difficult because primary studies often use different methods of measurement and/or different cut‐points to dichotomise continuous factors into ‘high’ and ‘low’ groups; selective reporting is also common. We illustrate how multivariate random effects meta‐analysis models can accommodate multiple prognostic effect estimates from the same study, relating to multiple cut‐points and/or methods of measurement. The models account for within‐study and between‐study correlations, which utilises more information and reduces the impact of unreported cut‐points and/or measurement methods in some studies. The applicability of the approach is improved with individual participant data and by assuming a functional relationship between prognostic effect and cut‐point to reduce the number of unknown parameters. The models provide important inferential results for each cut‐point and method of measurement, including the summary prognostic effect, the between‐study variance and a 95% prediction interval for the prognostic effect in new populations. Two applications are presented. The first reveals that, in a multivariate meta‐analysis using published results, the Apgar score is prognostic of neonatal mortality but effect sizes are smaller at most cut‐points than previously thought. In the second, a multivariate meta‐analysis of two methods of measurement provides weak evidence that microvessel density is prognostic of mortality in lung cancer, even when individual participant data are available so that a continuous prognostic trend is examined (rather than cut‐points). © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

15.
Recent advances in sequencing technologies have made it possible to explore the influence of rare variants on complex diseases and traits. Meta‐analysis is essential to this exploration because large sample sizes are required to detect rare variants. Several methods are available to conduct meta‐analysis for rare variants under fixed‐effects models, which assume that the genetic effects are the same across all studies. In practice, genetic associations are likely to be heterogeneous among studies because of differences in population composition, environmental factors, phenotype and genotype measurements, or analysis method. We propose random‐effects models which allow the genetic effects to vary among studies and develop the corresponding meta‐analysis methods for gene‐level association tests. Our methods take score statistics, rather than individual participant data, as input and thus can accommodate any study designs and any phenotypes. We produce the random‐effects versions of all commonly used gene‐level association tests, including burden, variable threshold, and variance‐component tests. We demonstrate through extensive simulation studies that our random‐effects tests are substantially more powerful than the fixed‐effects tests in the presence of moderate and high between‐study heterogeneity and achieve similar power to the latter when the heterogeneity is low. The usefulness of the proposed methods is further illustrated with data from National Heart, Lung, and Blood Institute Exome Sequencing Project (NHLBI ESP). The relevant software is freely available.  相似文献   

16.
A multivariate meta‐analysis of two or more correlated outcomes is expected to improve precision compared with a series of independent, univariate meta‐analyses especially when there are studies reporting some but not all outcomes. Multivariate meta‐analysis requires estimates of the within‐study correlations, which are seldom available. Existing methods for analysing multiple outcomes simultaneously are limited to pairwise treatment comparisons. We propose a model for a joint, simultaneous synthesis of multiple dichotomous outcomes in a network of interventions and introduce a simple way to elicit expert opinion for the within‐study correlations by utilizing a set of conditional probability parameters. We implement our multiple‐outcomes network meta‐analysis model within a Bayesian framework, which allows incorporation of expert information. As an example, we analyse two correlated dichotomous outcomes, response to the treatment and dropout rate, in a network of pharmacological interventions for acute mania. The produced estimates have narrower confidence intervals compared with the simple network meta‐analysis. We conclude that the proposed model and the suggested prior elicitation method for correlations constitute a useful framework for performing network meta‐analysis for multiple outcomes. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

17.
Gene‐by‐environment (G × E) interactions are important in explaining the missing heritability and understanding the causation of complex diseases, but a single, moderately sized study often has limited statistical power to detect such interactions. With the increasing need for integrating data and reporting results from multiple collaborative studies or sites, debate over choice between mega‐ versus meta‐analysis continues. In principle, data from different sites can be integrated at the individual level into a “mega” data set, which can be fit by a joint “mega‐analysis.” Alternatively, analyses can be done at each site, and results across sites can be combined through a “meta‐analysis” procedure without integrating individual level data across sites. Although mega‐analysis has been advocated in several recent initiatives, meta‐analysis has the advantages of simplicity and feasibility, and has recently led to several important findings in identifying main genetic effects. In this paper, we conducted empirical and simulation studies, using data from a G × E study of lung cancer, to compare the mega‐ and meta‐analyses in four commonly used G × E analyses under the scenario that the number of studies is small and sample sizes of individual studies are relatively large. We compared the two data integration approaches in the context of fixed effect models and random effects models separately. Our investigations provide valuable insights in understanding the differences between mega‐ and meta‐analyses in practice of combining small number of studies in identifying G × E interactions.  相似文献   

18.
Kernel machine learning methods, such as the SNP‐set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single‐SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi‐SNP testing approaches, kernel machine testing can draw conclusion only at the SNP‐set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.  相似文献   

19.
Missing outcome data are a problem commonly observed in randomized control trials that occurs as a result of participants leaving the study before its end. Missing such important information can bias the study estimates of the relative treatment effect and consequently affect the meta‐analytic results. Therefore, methods on manipulating data sets with missing participants, with regard to incorporating the missing information in the analysis so as to avoid the loss of power and minimize the bias, are of interest. We propose a meta‐analytic model that accounts for possible error in the effect sizes estimated in studies with last observation carried forward (LOCF) imputed patients. Assuming a dichotomous outcome, we decompose the probability of a successful unobserved outcome taking into account the sensitivity and specificity of the LOCF imputation process for the missing participants. We fit the proposed model within a Bayesian framework, exploring different prior formulations for sensitivity and specificity. We illustrate our methods by performing a meta‐analysis of five studies comparing the efficacy of amisulpride versus conventional drugs (flupenthixol and haloperidol) on patients diagnosed with schizophrenia. Our meta‐analytic models yield estimates similar to meta‐analysis with LOCF‐imputed patients. Allowing for uncertainty in the imputation process, precision is decreased depending on the priors used for sensitivity and specificity. Results on the significance of amisulpride versus conventional drugs differ between the standard LOCF approach and our model depending on prior beliefs on the imputation process. Our method can be regarded as a useful sensitivity analysis that can be used in the presence of concerns about the LOCF process. Copyright © 2014 JohnWiley & Sons, Ltd.  相似文献   

20.
Genome‐wide association studies have recently identified many new loci associated with human complex diseases. These newly discovered variants typically have weak effects requiring studies with large numbers of individuals to achieve the statistical power necessary to identify them. Likely, there exist even more associated variants, which remain to be found if even larger association studies can be assembled. Meta‐analysis provides a straightforward means of increasing study sample sizes without collecting new samples by combining existing data sets. One obstacle to combining studies is that they are often performed on platforms with different marker sets. Current studies overcome this issue by imputing genotypes missing from each of the studies and then performing standard meta‐analysis techniques. We show that this approach may result in a loss of power since errors in imputation are not accounted for. We present a new method for performing meta‐analysis over imputed single nucleotide polymorphisms, show that it is optimal with respect to power, and discuss practical implementation issues. Through simulation experiments, we show that our imputation aware meta‐analysis approach outperforms or matches standard meta‐analysis approaches. Genet. Epidemiol. 34: 537–542, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号