首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
In this paper a new approach for the prediction of protein coding gene structures is described. The principal scheme of prediction is as follows: first, the exons with the best potential are predicted in a sequence with unknown functions and a list of potential amino acid fragments coded by these exons is formed. Second, testing the homology between each amino acid fragment from the list and proteins from the SWISS-PROT database of amino acid sequences. One protein with the best homology is chosen out of all the homologous sequences. Third, reconstruction of the exon-intron structure, basing it on its homology with the chosen protein sequences. The method was tested on an independent control set (20 genes). The results were as follows: 21% of real exons were lost and 3% of non-real exons were found. This system can be used to refine the results of gene prediction systems, especially if highly homologous proteins are found in the amino acid sequence database.  相似文献   

2.
Sixteen P1 and TAC clones assigned to Arabidopsis thaliana chromosome 5 were sequenced, and their sequence features were analyzed using various computer programs. The total length of the sequences determined was 1,013,767 bp. Together with the nucleotide sequences of 109 clones previously reported, the regions of chromosome 5 sequenced so far now total 9,072,622 bp, which presumably covers approximately one-third of the chromosome. A similarity search against the reported gene sequences predicted the presence of a total of 225 protein-coding genes and/or gene segments in the newly sequenced regions, indicating an average gene density of one gene per 4.5 kb. Introns were identified in 72.4% of the potential protein genes for which the entire gene structure was predicted, and the average number per gene and the average length of the introns were 3.3 and 163 bp, respectively. These sequence features are essentially identical to those in the previously reported sequences. The sequence data and gene information are available on the World Wide Web database KAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/arabi/.  相似文献   

3.
Small nucleolar RNAs (snoRNAs) are involved in cleavage of rRNA, modification of rRNA nucleotides and, perhaps, other aspects of ribosome biogenesis in eukaryotic cells. Scores of snoRNAs have been discovered in recent years from various eukaryotes, and the total number is predicted to be up to 200 different snoRNA species per individual organism. We have created a comprehensive database for snoRNAs from the yeast Saccharomyces cerevisiae which allows easy access to detailed information about each species known (almost 70 snoRNAs are featured). The database consists of three major parts: (i) a utilities section; (ii) a master table; and (iii) a collection of tables for the individual snoRNAs. The utilities section provides an introduction to the database. The master table lists all known S. cerevisiae snoRNAs and their major properties. Information in the individual tables includes: alternate names, size, family classification, genomic organization, sequences (with major features identified), GenBank accession numbers, occurrence of homologues, gene disruption phenotypes, functional properties and associated RNAs and proteins. All information is accompanied with appropriate literature references. The database is available on the World Wide Web (http://www.bio.umass. edu/biochem/rna-sequence/Yeast_snoRNA_Database/snoRNA_ DataBase.html), and should be useful for a wide range of snoRNA studies.  相似文献   

4.
5.
The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database.  相似文献   

6.
Two full-length cDNAs, gbr-2A and gbr-2B, encoding inhibitory amino acid receptor subunits have been amplified and cloned from Caenorhabditis elegans mRNA. The 5' 732 bp of the two cDNAs, encoding 237 amino acids, are identical. The 3' 758 bp of the gbr-2B cDNA are present within the 3' untranslated region of the gbr-2A clone. As a result, the two cDNAs are predicted to encode subunits which share a common extracellular N-terminal sequence of 237 amino acids, but different, though closely related, C-terminal sequences which include four predicted membrane-spanning regions. A search of the EMBL database revealed that the sequences of the two subunits are most closely related to the alpha-subunit of the C. elegans avermectin receptor. Northern blot analysis showed the presence of two related mRNAs of approximately 2.2 and 1.5 kb in a developmentally mixed population of C. elegans. The genomic DNA sequence confirms that both mRNAs were transcribed from the same gene, gbr-2, suggesting that the closely related 3' sequences have arisen as a result of a partial gene duplication event. We propose that C. elegans is utilising alternative splicing to generate receptor subunits with identical extracellular, ligand-binding domains but different transmembrane, channel forming domains.  相似文献   

7.
HUGE is a database for human large proteins newly identified by Kazusa cDNA project, which aims to predict protein primary structures from sequences of human large cDNAs (>4 kb). In particular, cDNA clones capable of coding for large proteins (>50 kDa) are current targets of the project. More than 700 sequences of human cDNAs (average size, 5.1 kb) have been determined to date and deposited in the public databases. Notable information implied from the cDNAs and the predicted protein sequences can be obtained through HUGE via the World Wide Web at URL http://www.kazusa.or.jp/huge  相似文献   

8.
9.
Two homologous sequences, which have diverged beyond the point where their homology can be recognised by a simple direct comparison, can be related through a third sequence that is suitably intermediate between the two. High scores, for a sequence match between the first and third sequences and between the second and the third sequences, imply that the first and second sequences are related even though their own match score is low. We have tested the usefulness of this idea using a database that contains the sequences of 971 protein domains whose structures are known and whose residue identities with each other are some 40% or less (PDB40D). On the basis of sequence and structural information, 2143 pairs of these sequences are known to have an evolutionary relationship. FASTA, in an all-against-all comparison of the sequences in the database, detected 320 (15%) of these relationships as well as three false positive (i.e. 1% error rate). Using intermediate sequences found by FASTA matches of PDB40D sequences to those in the large non-redundant OWL database we could detect 550 evolutionary relationships with an error rate of 1%. This means the intermediate sequence procedure increases the ability to recognise the evolutionary relationships amongst the PDB40D sequences by 70%.  相似文献   

10.
tmRNA (also known as 10Sa RNA) is so-named for its dual tRNA-like and mRNA-like nature. It is employed in a remarkable trans -translation process to add a C-terminal peptide tag to the incomplete protein product of a broken mRNA; the tag targets the abnormal protein for proteolysis. tmRNA sequences have been identified in genomes of diverse bacterial phyla, including the most deeply branching. They have also been identified in plastids of the 'red' lineage. The tmRNA Website (http://www.wi.mit. edu/bartel/tmRNA/home ) contains a database currently including sequences from 37 species, with provisional alignments, as well as the tentatively predicted proteolysis tag sequences. A brief review and guide to the literature is also provided.  相似文献   

11.
The Histone Sequence Database is an annotated and searchable collection of all available histone and histone fold sequences and structures. Particular emphasis has been placed on documenting conflicts between similar sequence entries from a number of source databases, conflicts that are not necessarily documented in the source databases themselves. New additions to the database include compilations of post-translational modifications for each of the core and linker histones, as well as genomic information in the form of map loci for the human histone gene complement, with the genetic loci linked to Online Mendelian Inheritance in Man (OMIM). The database is freely accessible through the World Wide Web at either http://genome.nhgri.nih.gov/histones/ or http://www.ncbi.nlm.nih. gov/Baxevani/HISTONES  相似文献   

12.
13.
14.
Fibroblasts are the major cell type responsible for synthesizing matrix constituents in lung and other connective tissues. Evidence indicates that fibroblasts are heterogeneous, and that subpopulations with some distinct properties are clonally selected and expanded in fibrotic diseases. However, few distinct markers capable of demonstrating the presence of fibroblast subpopulations in tissues have been isolated so far. With the objective of identifying proteins that could detect fibroblast subpopulations, we compared the messenger RNA (mRNA) expression of two cultured human lung fibroblast subpopulations by differential display. Total RNA was obtained, complementary DNA (cDNA) was synthesized, and the polymerase chain reaction (PCR) products obtained with several primer pairs were compared. One 724-bp product, which was strongly expressed by one human lung fibroblast subpopulation, was identified and cloned. This product was poorly expressed by the other lung fibroblast subpopulation. The mRNA for the gene encoding this product was not detectable in human smooth-muscle cells, endothelial cells, or epithelial cells, although it was present in dermal fibroblasts. The mRNA was detected in normal and fibrotic human lungs. Search of the National Center for Biotechnology (NCBI) GenBank DNA database with the sequence obtained from this clone revealed no significant matches. However, a search of the NCBI database of expressed sequence tags (dBEST) revealed five different human expressed sequence tag (EST) clones corresponding to the LR8 cDNA sequence. Six additional mouse and one pig EST clones were identified that showed significant similarity to the human fibroblast cDNA. Composites of the entire coding sequences for the human fibroblast gene product and the mouse homologue were assembled from the respective overlapping EST sequences. The open reading frame identified for each composite sequence predicted protein products of 270 and 263 amino acids for the human and mouse sequences, respectively, which were 52% identical, with three gaps. At the amino acid level, no significant sequence similarity was detected with any other sequences in exhaustive searches of the NCBI DNA and protein databases or the Blocks databases. A PCR product with predicted length and sequence was obtained by using a sense primer upstream to LR8 and an antisense primer within LR8. Our results indicate that this differentially displayed product represents a previously undescribed protein that could be useful for distinguishing fibroblasts, and possibly fibroblast subpopulations, from other cell types in lungs and other tissues.  相似文献   

15.
The hD52 gene was originally identified through its elevated expression level in human breast carcinoma. Cloning of D52 homologues from other species has indicated that D52 may play roles in calcium-mediated signal transduction and cell proliferation. Two human homologues of hD52, hD53 and hD54, have also been identified, demonstrating the existence of a novel gene/protein family. Since D52-like protein sequences are all predicted to contain a coiled-coil domain, we used the yeast two-hybrid system and glutathione S-transferase pull-down assays to investigate whether homo- and/or heteromeric interactions occur between D52-like proteins. Analyses of yeast strains co-transfected with paired D52-like constructs indicated that D52-like fusion proteins interact in homo- and heteromeric fashions through their predicted coiled-coil domains. Similarly, extensive two-hybrid screenings of a human breast carcinoma expression library identified hD53 and hD52 as potential interactors for both hD52 and hD53 baits. Thus, D52-like proteins appear to exert and/or regulate their activities through specific interactions with other D52-like proteins, which in turn may be intrinsic to potential roles of these molecules in controlling cell proliferation.  相似文献   

16.
The National Center for Biotechnology Information (NCBI), part of the National Library of Medicine, was established in 1988 to perform basic research in the field of computational molecular biology as well as build and distribute molecular biology databases. The basic research has led to new algorithms and analysis tools for interpreting genomic data and has been instrumental in the discovery of human disease genes for neurofibromatosis and Kallmann syndrome. The principal database responsibility is the National Institutes of Health (NIH) genetic sequence database, GenBank. NCBI, in collaboration with international partners, builds, distributes, and provides online and CD-ROM access to over 112,000 DNA sequences. Another major program is the integration of multiple sequences databases and related bibliographic information and the development of network-based retrieval systems for Internet access.  相似文献   

17.
18.
19.
Oocyte development within avian ovarian follicles is an intricate process involving yolk deposition and the formation of extraoocytic matrices. Of these, the perivitelline membrane (pvm) not only plays a role in sperm binding but also provides mechanical support for the large oocyte's journey through the oviduct after ovulation. To date we have focused on the mechanisms for uptake of yolk precursors into oocytes of the chicken; now we extend our studies to a detailed analysis of the pvm. In the course of characterization of its major components, we obtained partial protein sequences; comparison with the GenBank database revealed that one of the pvm proteins is the homologue of mammalian zona pellucida glycoprotein 3 (ZP3), a key component in sperm binding. Following a nomenclature based on gene structure, the protein is referred to as chicken ZPC (chZPC). The chicken protein (444 residues) and murine ZP3 (424 residues) are highly conserved, with 41% of the amino acids identical. As shown by Northern blot analysis, the avian ZPC gene is expressed exclusively in the granulosa cells surrounding the oocyte, in contrast to murine ZP3, which is synthesized by the oocyte. Upon reaching a size larger than 1.5 mm in diameter, follicles accumulate chZPC in highly polarized fashion, i.e., in the space intercalated between the oocyte and the granulosa cells, as revealed by immunohistochemistry of follicle sections. ChZPC synthesis and secretion by granulosa cells was demonstrated directly by metabolic labeling and immunoprecipitation from the culture medium of granulosa cell sheets isolated ex vivo from follicles. Immunoblot analysis and glycosidase treatment of chZPC from preovulatory and freshly ovulated oocytes, as well as laid eggs, revealed that the primary product undergoes a two-step decrease in size from follicle to laid egg that is unlikely to be due to modification of the carbohydrate moiety.  相似文献   

20.
The GenBank (Registered Trademark symbol) sequence database incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from individual laboratories and from large-scale sequencing projects. Most submitters use the BankIt (Web) or Sequin programs to format and send sequence data. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome and protein structure information. MEDLINE (Registered Trademark symbol) s from published articles describing the sequences are included as an additional source of biological annotation through the PubMed search system. Sequence similarity searching is offered through the BLAST series of database search programs. In addition to FTP, Email, and server/client versions of Entrez and BLAST, NCBI offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the URL: http://www.ncbi.nlm.nih.gov  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号