首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Nucleotide sequence comparisons of three house-keeping genes, adenylate kinase (adk), shikimate dehydrogenase (aroE), and glucose-6-phosphate dehydrogenase (gdh), were used to infer the phylogeny of 33 gamma-proteobacteria. Phylogenetic trees inferred from each gene, and from the concatenated sequences of all three genes, are, in general, similar to a 16S rRNA gene-inferred tree. Similar grouping of bacteria are revealed at the family, genus, species and strain levels in all five trees. The house-keeping genes, however, show a higher rate of nucleotide sequence substitutions. Consequently, they can possibly probe deeper branches of a phylogenetic tree than the 16S rRNA gene. However, because their nucleotide sequences are not as highly conserved among gamma-proteobacteria, family- or genus-specific primers would need to be designed for the amplification of any of these three house-keeping genes. Since these genes are used in multilocus sequence typing, it is expected that the number of sequences publicly available for many taxa will increase over time proving them very useful either at complementing 16S rRNA-inferred phylogenies or for specific, targeted, phylogenetic analysis.  相似文献   

2.
A heuristic approach to search for the maximum-likelihood (ML) phylogenetic tree based on a genetic algorithm (GA) has been developed. It outputs the best tree as well as multiple alternative trees that are not significantly worse than the best one on the basis of the likelihood criterion. These near-optimum trees are subjected to further statistical tests. This approach enables ones to infer phylogenetic trees of over 20 taxa taking account of the rate heterogeneity among sites on practical time scales on a PC cluster. Computer simulations were conducted to compare the efficiency of the present approach with that of several likelihood-based methods and distance-based methods, using amino acid sequence data of relatively large (5–24) taxa. The superiority of the ML method over distance-based methods increases as the condition of simulations becomes more realistic (an incorrect model is assumed or many taxa are involved). This approach was applied to the inference of the universal tree based on the concatenated amino acid sequences of vertically descendent genes that are shared among all genomes whose complete sequences have been reported. The inferred tree strongly supports that Archaea is paraphyletic and Eukarya is specifically related to Crenarchaeota. Apart from the paraphyly of Archaea and some minor disagreements, the universal tree based on these genes is largely consistent with the universal tree based on SSU rRNA. Received: 4 January 2001 / Accepted: 16 May 2001  相似文献   

3.
One of the major issues in phylogenetic analysis is that gene genealogies from different gene regions may not reflect the true species tree or history of speciation. This has led to considerable debate about whether concatenation of loci is the best approach for phylogenetic analysis. The application of Next‐generation sequencing techniques such as RAD‐seq generates thousands of relatively short sequence reads from across the genomes of the sampled taxa. These data sets are typically concatenated for phylogenetic analysis leading to data sets that contain millions of base pairs per taxon. The influence of gene region conflict among so many loci in determining the phylogenetic relationships among taxa is unclear. We simulated RAD‐seq data by sampling 100 and 500 base pairs from alignments of over 6000 coding regions that each produce one of three highly supported alternative phylogenies of seven species of Drosophila. We conducted phylogenetic analyses on different sets of these regions to vary the sampling of loci with alternative gene trees to examine the effect on detecting the species tree. Irrespective of sequence length sampled per region and which subset of regions was used, phylogenetic analyses of the concatenated data always recovered the species tree. The results suggest that concatenated alignments of Next‐generation data that consist of many short sequences are robust to gene tree/species tree conflict when the goal is to determine the phylogenetic relationships among taxa.  相似文献   

4.
To improve the accuracy of tree reconstruction, phylogeneticists are extracting increasingly large multigene data sets from sequence databases. Determining whether a database contains at least k genes sampled from at least m species is an NP-complete problem. However, the skewed distribution of sequences in these databases permits all such data sets to be obtained in reasonable computing times even for large numbers of sequences. We developed an exact algorithm for obtaining the largest multigene data sets from a collection of sequences. The algorithm was then tested on a set of 100,000 protein sequences of green plants and used to identify the largest multigene ortholog data sets having at least 3 genes and 6 species. The distribution of sizes of these data sets forms a hollow curve, and the largest are surprisingly small, ranging from 62 genes by 6 species, to 3 genes by 65 species, with more symmetrical data sets of around 15 taxa by 15 genes. These upper bounds to sequence concatenation have important implications for building the tree of life from large sequence databases.  相似文献   

5.
We illustrate how recently developed large sequence-length approximations to probabilities of correct phylogenetic reconstruction for maximum likelihood estimation can be used to evaluate experimental design strategies. The specific criterion of interest is the probability of correctly resolving an a priori defined split of interest in a phylogenetic tree. Design strategies considered include increased taxon sampling and increasing sequence length. Our analyses of specific examples strongly suggest that it is better to sample taxa that connect as close as possible to the split of interest. Assuming this can be done, these examples suggest it is better to sample additional taxa than to add a comparable number of sites for the existing taxa. If the rates of evolution in the added taxa are slow, it is better to choose taxa connecting to a long edge, but if rates are comparable to a sister lineage, it is not necessarily the best strategy to sample taxa connected to a long edge. We also examined deleting taxa while increasing the number of sites. Although deleting a small number of taxa distant from the split of interest can be beneficial, deleting too many or making poor choices as to what should be deleted can lead to smaller probabilities of correct reconstruction than for the original sequence data.  相似文献   

6.
Complex evolution of vitellogenin genes in salmonid fishes   总被引:2,自引:0,他引:2  
Vitellogenins (Vtg) are usually encoded by small multigene families containing up to six genes. With 20 tandemly arranged genes, the rainbow trout ( Oncorhynchus mykiss) is an exception to this rule. PCR amplification, cloning and sequence analysis of Vtg genes in other salmonid species revealed the existence of two paralogous gene clusters, designated Vtg-A and Vtg-B. Southern hybridization showed that the number of genes varies from 2 to 30 copies from one species to another, as well as between the two gene clusters. All Coregonus, Thymallus, Salmo and Salvelinus species studied have both gene clusters, while Oncorhynchus species possess only the Vtg-A locus. Phylogenetic trees constructed from Vtg sequences revealed conflicting nodes with the consensus tree based on morphological and anatomical data. Vtg sequences support the grouping ( Salmo, ( Salvelinus, Oncorhynchus)) instead of the accepted consensus ( Salvelinus, ( Salmo, Oncorhynchus)). Structural data on gene organization also support the contention that Salvelinus and Oncorhynchus are sister taxa. Evolutionary implications for the Vtg gene clusters in salmonids are discussed.  相似文献   

7.
基于细胞核rDNA ITS片段的水青冈属的分子系统发育   总被引:6,自引:0,他引:6  
对山毛榉科水青冈属6种、1亚种、1栽培变种的ITS区片段进行了测序和分析,并对其中2个具有ITS序列多态性的分类群进行了ITS区克隆。水青冈属ITS系统发育树聚成两支,位于基部的是分布于北美的大叶水青冈,另一分支则包括了欧洲和东亚的类群。在欧洲和东亚分支中,又包括两支,其中日本北部的波叶水青冈位于基部,台湾水青冈和欧亚大陆的水青冈形成另外一支。ITS区分析与现行的水青冈属基于形态学性状的属下分类系统有一定差异,而与本属现存物种的地理分布格局较为一致。各类群间TIS区序列差异较小,显示属内现存物种的分化时间不是太长。  相似文献   

8.
Gary Voelker 《Ibis》2002,144(4):577-584
I used combined sequences of mitochondrial cytochrome  b and ND2 genes to determine the molecular phylogenetic relationships of all five extant species of dipper ( Cinclus ), as well as the relationships of Cinclidae to postulated nearest relatives. All methods of analysis resulted in a single best tree of dipper relationships, uniting the two South American taxa (as sisters) with the single North American exemplar, and the two Eurasian taxa forming a sister clade to the New World taxa. Further, each tree identified thrushes (Turdidae) as the closest relative to Cinclidae. Based on relationships within Cinclus , a Eurasian ancestral area is proposed, with subsequent movement into the New World. Dating of species divergences suggest that dippers arose approximately 4 mya, and achieved their present continental distributions soon after.  相似文献   

9.
With the astonishing rate that genomic and metagenomic sequence data sets are accumulating, there are many reasons to constrain the data analyses. One approach to such constrained analyses is to focus on select subsets of gene families that are particularly well suited for the tasks at hand. Such gene families have generally been referred to as “marker” genes. We are particularly interested in identifying and using such marker genes for phylogenetic and phylogeny-driven ecological studies of microbes and their communities (e.g., construction of species trees, phylogenetic based assignment of metagenomic sequence reads to taxonomic groups, phylogeny-based assessment of alpha- and beta-diversity of microbial communities from metagenomic data). We therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology) markers. The dual use of these PhyEco markers means that we needed to develop and apply a set of somewhat novel criteria for identification of the best candidates for such markers. The criteria we focused on included universality across the taxa of interest, ability to be used to produce robust phylogenetic trees that reflect as much as possible the evolution of the species from which the genes come, and low variation in copy number across taxa.We describe here an automated protocol for identifying potential PhyEco markers from a set of complete genome sequences. The protocol combines rapid searching, clustering and phylogenetic tree building algorithms to generate protein families that meet the criteria listed above. We report here the identification of PhyEco markers for different taxonomic levels including 40 for “all bacteria and archaea”, 114 for “all bacteria (greatly expanding on the ∼30 commonly used), and 100 s to 1000 s for some of the individual phyla of bacteria. This new list of PhyEco markers should allow much more detailed automated phylogenetic and phylogenetic ecology analyses of these groups than possible previously.  相似文献   

10.
11.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

12.
13.
Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.  相似文献   

14.
Environmental sequences have become a major source of information. High‐throughput sequencing (HTS) surveys have been used to infer biogeographic patterns and distribution of broad taxa of protists. This approach is, however, more questionable for addressing low‐rank (less inclusive) taxa such as species and genera, because of the increased chance of errors in identification due to blurry taxonomic boundaries, low sequence divergence, or sequencing errors. The specious ciliate genus Euplotes partially escapes these limitations. It is a ubiquitous, monophyletic taxon, clearly differentiated from related genera, and with a relatively well‐developed internal systematics. It has also been the focus of several ecological studies. We present an update on Euplotes biogeography, taking into consideration for the first time environmental sequences, both traditional (Sanger) and HTS. We inferred a comprehensive small subunit rRNA gene phylogeny of the genus including a newly described marine species, Euplotes enigma, characterized by a unique question mark‐shaped macronucleus. We then added available environmental sequences to the tree, mapping associated metadata. The resulting scenario conflicts on many accounts with previously held views, suggesting, for example, that a large diversity of anaerobic Euplotes species exist, and that marine representatives of mainly freshwater lineages (and vice‐versa) might be more common than previously thought.  相似文献   

15.
A comprehensive phylogeny of papilionoid legumes was inferred from sequences of 2228 taxa in GenBank release 147. A semiautomated analysis pipeline was constructed to download, parse, assemble, align, combine, and build trees from a pool of 11,881 sequences. Initial steps included all-against-all BLAST similarity searches coupled with assembly, using a novel strategy for building length-homogeneous primary sequence clusters. This was followed by a combination of global and local alignment protocols to build larger secondary clusters of locally aligned sequences, thus taking into account the dramatic differences in length of the heterogeneous coding and noncoding sequence data present in GenBank. Next, clusters were checked for the presence of duplicate genes and other potentially misleading sequences and examined for combinability with other clusters on the basis of taxon overlap. Finally, two supermatrices were constructed: a "sparse" matrix based on the primary clusters alone (1794 taxa x 53,977 characters), and a somewhat more "dense" matrix based on the secondary clusters (2228 taxa x 33,168 characters). Both matrices were very sparse, with 95% of their cells containing gaps or question marks. These were subjected to extensive heuristic parsimony analyses using deterministic and stochastic heuristics, including bootstrap analyses. A "reduced consensus" bootstrap analysis was also performed to detect cryptic signal in a subtree of the data set corresponding to a "backbone" phylogeny proposed in previous studies. Overall, the dense supermatrix appeared to provide much more satisfying results, indicated by better resolution of the bootstrap tree, excellent agreement with the backbone papilionoid tree in the reduced bootstrap consensus analysis, few problematic large polytomies in the strict consensus, and less fragmentation of conventionally recognized genera. Nevertheless, at lower taxonomic levels several problems were identified and diagnosed. A large number of methodological issues in supermatrix construction at this scale are discussed, including detection of annotation errors in GenBank sequences; the shortage of effective algorithms and software for local multiple sequence alignment; the difficulty of overcoming effects of fragmentation of data into nearly disjoint blocks in sparse supermatrices; and the lack of informative tools to assess confidence limits in very large trees.  相似文献   

16.
Multilocus sequence analysis (MLSA) is an important method for identification of taxa that are not well differentiated by 16S rRNA gene sequences alone. In this procedure, concatenated sequences of selected genes are constructed and then analyzed. The effects that the number and the order of genes used in MLSA have on reconstruction of phylogenetic relationships were examined. The recA, rpoA, gapA, 16S rRNA gene, gyrB, and ftsZ sequences from 56 species of the genus Vibrio were used to construct molecular phylogenies, and these were evaluated individually and using various gene combinations. Phylogenies from two-gene sequences employing recA and rpoA in both possible gene orders were different. The addition of the gapA gene sequence, producing all six possible concatenated sequences, reduced the differences in phylogenies to degrees of statistical (bootstrap) support for some nodes. The overall statistical support for the phylogenetic tree, assayed on the basis of a reliability score (calculated from the number of nodes having bootstrap values of ≥80 divided by the total number of nodes) increased with increasing numbers of genes used, up to a maximum of four. No further improvement was observed from addition of the fifth gene sequence (ftsZ), and addition of the sixth gene (gyrB) resulted in lower proportions of strongly supported nodes. Reductions in the numbers of strongly supported nodes were also observed when maximum parsimony was employed for tree construction. Use of a small number of gene sequences in MLSA resulted in accurate identification of Vibrio species.  相似文献   

17.
Synopsis Approximately 98% of the sequence of the 18S ribosomal RNA (rRNA) of the coelacanth Latimeria chalumnae was determined by a combination of direct RNA sequencing and sequencing of rRNA genes amplified by the polymerase chain reaction. This sequence was compared with 18S rRNA sequences of similar length from seven other vertebrate species, representing the taxa Petromyzontiformes, Holocephali, Elasmobranchii, Actinopterygii, Dipnoi, Amphibia, and Amniota, in order to determine the most likely sister group of the coelacanth. Maximum parsimony analysis of these sequences resulted in a single most parsimonious tree containing a number of anomalous relationships among these groups. A bootstrap analysis showed that none of the relationships in this tree was significantly supported at the 95% level, however. Addition of data from 15 other vertebrates (providing multiple representatives of most of the higher taxa) resulted in similar ambiguous groupings, as did a number of methods of editing the sites compared (designed to eliminate rapidly evolving positions). These results may be due to a relatively rapid radiation of the major lineages of osteichthyans, the resolution of which will require molecular information from a larger portion of the coelacanth genome.  相似文献   

18.
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.  相似文献   

19.
20.
We present a phylogenetic hypothesis and novel, rank-free classification for all extant species of softshell turtles (Testudines:Trionychidae). Our data set included DNA sequence data from two mitochondrial protein-coding genes and a approximately 1-kb nuclear intron for 23 of 26 recognized species, and 59 previously published morphological characters for a complimentary set of 24 species. The combined data set provided complete taxonomic coverage for this globally distributed clade of turtles, with incomplete data for a few taxa. Although our taxonomic sampling is complete, most of the modern taxa are representatives of old and very divergent lineages. Thus, due to biological realities, our sampling consists of one or a few representatives of several ancient lineages across a relatively deep phylogenetic tree. Our analyses of the combined data set converge on a set of well-supported relationships, which is in accord with many aspects of traditional softshell systematics including the monophyly of the Cyclanorbinae and Trionychinae. However, our results conflict with other aspects of current taxonomy and indicate that most of the currently recognized tribes are not monophyletic. We use this strong estimate of the phylogeny of softshell turtles for two purposes: (1) as the basis for a novel rank-free classification, and (2) to retrospectively examine strategies for analyzing highly homoplasious mtDNA data in deep phylogenetic problems where increased taxon sampling is not an option. Weeded and weighted parsimony, and model-based techniques, generally improved the phylogenetic performance of highly homoplasious mtDNA sequences, but no single strategy completely mitigated the problems of associated with these highly homoplasious data. Many deep nodes in the softshell turtle phylogeny were confidently recovered only after the addition of largely nonhomoplasious data from the nuclear intron.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号