首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 584 毫秒
1.
2.
We describe the impact of advances in mass measurement accuracy, +/- 10 ppm (internally calibrated), on protein identification experiments. This capability was brought about by delayed extraction techniques used in conjunction with matrix-assisted laser desorption ionization (MALDI) on a reflectron time-of-flight (TOF) mass spectrometer. This work explores the advantage of using accurate mass measurement (and thus constraint on the possible elemental composition of components in a protein digest) in strategies for searching protein, gene, and EST databases that employ (a) mass values alone, (b) fragment-ion tagging derived from MS/MS spectra, and (c) de novo interpretation of MS/MS spectra. Significant improvement in the discriminating power of database searches has been found using only molecular weight values (i.e., measured mass) of > 10 peptide masses. When MALDI-TOF instruments are able to achieve the +/- 0.5-5 ppm mass accuracy necessary to distinguish peptide elemental compositions, it is possible to match homologous proteins having > 70% sequence identity to the protein being analyzed. The combination of a +/- 10 ppm measured parent mass of a single tryptic peptide and the near-complete amino acid (AA) composition information from immonium ions generated by MS/MS is capable of tagging a peptide in a database because only a few sequence permutations > 11 AA's in length for an AA composition can ever be found in a proteome. De novo interpretation of peptide MS/MS spectra may be accomplished by altering our MS-Tag program to replace an entire database with calculation of only the sequence permutations possible from the accurate parent mass and immonium ion limited AA compositions. A hybrid strategy is employed using de novo MS/MS interpretation followed by text-based sequence similarity searching of a database.  相似文献   

3.
We describe a method for generating multiple small sequences from the N terminal of peptides in unseparated protein digests by stepwise thioacetylation and acid cleavage. The mass differences between a series of N-terminally degraded peptides give short sequences of defined length. Such short "sequence tags" together with the mass of the parent peptide can be used to identify the protein in a database. The sequence ladders are generated without the use of chain terminators or sample aliquoting and the degradation reagents are water soluble so that the chemistry can be carried out on peptides immobilized on C-18 reversed-phase supports without any peptide loss due to washing with organic solvents as occurs in Edman type sequencing. The entire procedure can be automated, and we describe a prototype device for the parallel analysis of multiple samples. We demonstrate the effectiveness of this chemical tagging method in a comparison with Edman sequencing, peptide mass fingerprinting, and MS/MS analysis of crude protein fractions obtained from an HPLC separation of the Escherichia coli ribosome complex which consists of 57 proteins. We show that chemical tagging is a viable first-pass high-throughput identification method to be used prior to an in depth MS/MS analysis.  相似文献   

4.
There are several computer programs that can match peptide tandem mass spectrometry data to their exactly corresponding database sequences, and in most protein identification projects, these programs are utilized in the early stages of data interpretation. However, situations frequently arise where tandem mass spectral data cannot be correlated with any database sequences. In these cases, the unmatched data could be due to peptides derived from novel proteins, allelic or species-derived variants of known proteins, or posttranslational or chemical modifications. Two additional problems are frequently encountered in high-throughput protein identification. First, it is difficult to quickly sift through large amounts of data to identify those spectra that, due to poor signal or contaminants, can be ignored. Second, it is important to find incorrect database matches (false positives). We have chosen to address these difficulties by performing automatic de novo sequencing using a computer program called Lutefisk. Sequence candidates obtained are used as input in a homology-based database search program called CIDentify to identify variants of known proteins. Comparison of database-derived sequences with de novo sequences allows for electronic validation of database matches even if the latter are not completely correct. Modifications to the original Lutefisk program have been implemented to handle data obtained from triple quadrupole, ion trap, and quadrupole/time-of-flight hybrid (Qtof) mass spectrometers. For example, the linearity of mass errors due to temperature-dependent expansion of the flight tube in a Qtof was exploited such that isobaric amino acids (glutamine/lysine and oxidized methionine/ phenylalanine) can be differentiated without careful attention to mass calibration.  相似文献   

5.
With high-mass accuracy and consecutively obtained electron transfer dissociation (ETD) and higher-energy collisional dissociation (HCD) tandem mass spectrometry (MS/MS), reliable (≥97%) and sensitive fragment ions have been extracted for identification of specific amino acid residues in peptide sequences. The analytical benefit of these specific amino acid composition (AAC) ions is to restrict the database search space and provide identification of peptides with higher confidence and reduced false negative rates. The 6706 uniquely identified peptide sequences determined with a conservative Mascot score of >30 were used to characterize the AAC ions. The loss of amino acid side chains (small neutral losses, SNLs) from the charge reduced peptide radical cations was studied using ETD. Complementary AAC information from HCD spectra was provided by immonium ions. From the ETD/HCD mass spectra, 5162 and 6720 reliable SNLs and immonium ions were successfully extracted, respectively. Automated application of the AAC information during database searching resulted in an average 3.5-fold higher confidence level of peptide identification. In addition, 4% and 28% more peptides were identified above the significance level in a standard and extended search space, respectively.  相似文献   

6.
Peptide mass mapping using matrix-assisted laser desorption/ionization (MALDI) mass spectrometry in conjunction with interrogation of sequence databases is a powerful tool for the identification of proteins. Glycosylated proteins often yield poor MALDI peptide maps due to shielding of proteolytic cleavage sites and the presence of modified peptides. Here we demonstrate that enzymatic removal of N-linked glycans with simultaneous partial (50%) 18O-labeling of glycosylated asparagine residues prior to proteolysis and MALDI peptide mass mapping can overcome these problems. As a result, more peptides are observed in MALDI spectra which, in turn, increases the specificity of subsequent database searches. Furthermore, the detection of a labeled peptide directly translates into partial sequence information as N-linked carbohydrates are exclusively attached to asparagine residues that form part of the NXS/T sequence. The mass of the formerly glycosylated peptide together with the NXS/T sequence pattern represents a discriminating criterion for database searching which, on average, increases the search specificity by a factor of 100. This procedure allows the unambiguous identification of glycoproteins that would otherwise require sequencing and, at the same time, enables the identification of N-glycosylation sites with higher sensitivity than previously possible.  相似文献   

7.
With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometry (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.  相似文献   

8.
A method for rapid and unambiguous identification of proteins by sequence database searching using the accurate mass of a single peptide and specific sequence constraints is described. Peptide masses were measured using electrospray ionization-Fourier transform ion cyclotron resonance mass spectrometry to an accuracy of 1 ppm. The presence of a cysteine residue within a peptide sequence was used as a database searching constraint to reduce the number of potential database hits. Cysteine-containing peptides were detected within a mixture of peptides by incorporating chlorine into a general alkylating reagent specific for cysteine residues. Secondary search constraints included the specificity of the protease used for protein digestion and the molecular mass of the protein estimated by gel electrophoresis. The natural isotopic distribution of chlorine encoded the cysteine-containing peptide with a distinctive isotopic pattern that allowed automatic screening of mass spectra. The method is demonstrated for a peptide standard and unknown proteins from a yeast lysate using all 6118 possible yeast open reading frames as a database. As judged by calculation of codon bias, low-abundance proteins were identified from the yeast lysate using this new method but not by traditional methods such as tandem mass spectrometry via data-dependent acquisition or mass mapping.  相似文献   

9.
The present study reports a procedure developed for the identification of SDS-polyacrylamide gel electrophoretically separated proteins using an electrospray ionization quadrupole time-of-flight mass spectrometer (Q-TOF MS) equipped with pressurized sample introduction. It is based on in-gel digestion of the proteins without previous reduction/alkylation and on the capability of the Q-TOF MS to provide data suitable for peptide mass fingerprinting database searches and for tandem mass spectrometry (MS/MS) database searches (sequence tags). Omitting the reduction/alkylation step reduces sample contamination and sample loss, resulting in increased sensitivity. Omitting this step can leave disulfide-connected peptides in the analyte that can lead to misleading or ambiguous results from the peptide mass fingerprinting database search. This uncertainty, however, is overcome by MS/MS analysis of the peptides. Furthermore, the two complementary MS approaches increase the accuracy of the assignment of the unknown protein. This procedure is thus, highly sensitive, accurate, and rapid. In combination with pressurized nanospray sample introduction, it is suitable for automated sample handling. Here, we apply this approach to identify protein contaminants observed during the purification of the yeast DNA mismatch repair protein Mlh 1.  相似文献   

10.
The characterization of proteomes by mass spectrometry is largely limited to organisms with sequenced genomes. To identify proteins from organisms with unsequenced genomes, database sequences from related species must be employed for sequence-similarity protein identifications. Peptide sequence tags (Mann, 1994) have been used successfully for the identification of proteins in sequence databases using partially interpreted tandem mass spectra of tryptic peptides. We have extended the ability of sequence tag searching to the identification of proteins whose sequences are yet unknown but are homologous to known database entries. The MultiTag method presented here assigns statistical significance to matches of multiple error-tolerant sequence tags to a database entry and ranks alignments by their significance. The MultiTag approach has the distinct advantage over other sequence-similarity approaches of being able to perform sequence-similarity identifications using only very short (2-4) amino acid residue stretches of peptide sequences, rather than complete peptide sequences deduced by de novo interpretation of tandem mass spectra. This feature facilitates the identification of low abundance proteins, since noisy and low-intensity tandem mass spectra can be utilized.  相似文献   

11.
Electrospray ionization (ESI) tandem mass spectrometry (MS/MS) of peptides in conjunction with automated sequence database searching of the resulting collision-induced dissociation (CID) spectra has become a powerful method for the identification of purified proteins or the components of protein mixtures. The success of the method is critically dependent on the manner by which the peptides are introduced into the mass spectrometer. In this report, we describe a capillary electrophoresis-based system for the automated, sensitive analysis of complex peptide mixtures. The system consists of an ESI-MS/MS instrument, a solid-phase extraction (SPE)-capillary zone electrophoresis (CZE) device for peptide concentration and separation, and an algorithm written in Instrument Control Language (ICL) which modulates the electrophoretic conditions in a data-dependent manner to optimize available time for the generation of high-quality CID spectra of peptides in complex samples. We demonstrate that the data-dependent modulation of the electric field significantly expands the analytical window for each peptide analyzed and that the sensitivity of the SPE-CZE technique is not noticeably altered by the procedure. By applying the technique to the analysis of in vivo phosphorylation sites of endothelial nitric oxide synthase (eNOS), we demonstrate the power of this system for the MS/MS analysis of minor peptide species in complex samples such as phosphopeptides generated by the proteolytic digestion of a large protein, eNOS, phosphorylated at low stoichiometry.  相似文献   

12.
A new strategy for identifying proteins in sequence data-bases by MALDI-MS peptide mapping is reported. The strategy corrects for systematic deviations of determined peptide molecular masses using information contained in the opened database and thereby renders unnecessary internal spectrum calibration. As a result, data acquisition is simplified and less error prone. Performance of the new strategy is demonstrated by identification of a set of recombinant, human cDNA expression products as well as native proteins isolated from crude mouse brain extracts by 2-D electrophoresis. Using one set of calibration constants for the mass spectrometric analyses, 20 proteins were identified without applying any molecular weight restrictions, which was not possible without data correction. A sequence database search program has been written that performs all necessary calculations automatically, access to which will be provided to the scientific community in the Internet.  相似文献   

13.
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead, DBDigger determines which spectra can be compared to each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization also reduces the number of times a spectrum must be predicted for a particular candidate sequence and charge state. As a result, DBDigger can accelerate some database searches by more than an order of magnitude. In addition, the software offers features to reduce the performance degradation introduced by posttranslational modification (PTM) searching. DBDigger allows researchers to specify the sequence context in which each PTM is possible. In the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini of peptides. Use of "context-dependent" PTM searching reduces the performance penalty relative to traditional PTM searching. We characterize the performance possible with DBDigger, showcasing MASPIC, a new statistical scorer. We describe the implementation of these innovations in the hope that other researchers will employ them for rapid and highly flexible proteomic database search.  相似文献   

14.
This paper presents application of sequential enhanced data processing procedures to high-resolution tandem mass spectra for identification of peptides using the Mascot database search algorithm. A strategy for (1) selection of fragment ion peaks from MS/MS spectra, (2) utilization of improved mass accuracy of the precursor ions, and (3) wavelet denoising of the mass spectra prior to fragment ion selection have been developed. The number of peptide identifications obtained using the enhanced processing was then compared with that obtained using software provided by the instrument manufacturer. Approximately 9000 MS/MS spectra acquired by the Applied Biosystems 4700 TOF/TOF MS instrument were used as a model data set. After application of the new processing, an increase of 33% unique peptides and 22% protein identifications with at least two unique peptides were found. The influence of the processing on the percentage of false positives, estimated by searching against a randomized database, was estimated to increase false positive identifications from 2.7 to 3.9%, which was still below the 5% error rate specified in the Mascot search. These data processing approaches increase the amount of information that can be extracted from LC-MS analysis without the necessity of additional experiments.  相似文献   

15.
We describe a method for comparative quantitation and de novo peptide sequencing of proteins separated either by standard chromatographic methods or by one- and two-dimensional polyacrylamide gel electrophoresis. The approach is based on the use of an isotopically labeled reagent to quantitate (by mass spectrometry) the ratio of peptides from digests of a protein being expressed under different conditions. The method allows quantitation of the changes occurring in spots or bands that contain more than one protein and has a greater dynamic range than most staining methods. Since the reagent carries a fixed positive charge under acidic conditions and labels only the N-terminal of peptides, the interpretation of tandem mass spectra to obtain sequence information is greatly simplified. The sequences can easily be extracted for homology searches instead of using indirect mass spectral-based searches and are independent of posttranslational modifications.  相似文献   

16.
Mo L  Dutta D  Wan Y  Chen T 《Analytical chemistry》2007,79(13):4870-4878
Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo.  相似文献   

17.
A new strategy for identifying proteins by MALDI-TOF-MS peptide mapping is reported. In contrast to current approaches, the strategy does not rely on a good relative or absolute mass accuracy as the criterion that discriminates false positive results. The protein sequence database is first searched for all proteins that match a minimum five of the submitted masses within the maximum expected relative errors when the default or externally determined calibration constants are used, for instance, +/-500 ppm. Typically, this search retrieves many thousand candidate sequences. Assuming initially that each of these is the correct protein, the relative errors of the matching peptide masses are calculated for each candidate sequence. Linear regression analysis is then performed of the calculated relative errors as a function of m/z for each candidate sequence, and the standard deviation to the regression is used to distinguish the correct sequence among the candidates. We show that this parameter is independent of whether the mass spectrometric data were internally or externally calibrated. The result is a search engine that renders internal spectrum calibration unnecessary and adapts to the quality of the raw data without user interference. This is made possible by a dynamic scoring algorithm, which takes into account the number of matching peptide masses, the percentage of the protein's sequence covered by these peptides and, as new parameter, the determined standard deviation. The lower the standard deviation, the less cleavage peptides are required for identification and vice versa. Performance of the new strategy is demonstrated and discussed. All necessary computing has been implemented in a computer program, free access to which is provided in the Internet.  相似文献   

18.
An improved method for peptide de novo sequencing by MALDI mass spectrometry is presented. The method couples a charge derivatization reaction with C-terminal digestion to modify tryptic peptides. The charge derivatization attaches a fixed charge group onto the N-termini of peptides, and the enzymatic digestion after the derivatization step removes C-terminal basic amino acid residues such as arginine and lysine. The fragmentation of the modified peptide(s) under low-energy CID conditions (MALDI Q-TOF mass spectrometer) yields a simplified yet complete ion series of the peptide sequence. The validity of the method is demonstrated by the results from several model protein digests, where peptide sequences were correctly deduced either manually or through an automated sequencing program.  相似文献   

19.
We investigated and compared three approaches for shotgun protein identification by combining MS and MS/MS information using LTQ-Orbitrap high mass accuracy data. In the first approach, we employed a unique mass identifier method where MS peaks matched to peptides predicted from proteins identified from an MS/MS database search are first subtracted before using the MS peaks as unique mass identifiers for protein identification. In the second method, we used an accurate mass and time tag method by building a potential mass and retention time database from previous MudPIT analyses. For the third method, we used a peptide mass fingerprinting-like approach in combination with a randomized database for protein identification. We show that we can improve protein identification sensitivity for low-abundance proteins by combining MS and MS/MS information. Furthermore, "one-hit wonders" from MS/MS database searching can be further substantiated by MS information and the approach improves the identification of low-abundance proteins. The advantages and disadvantages for the three approaches are then discussed.  相似文献   

20.
Detection and identification of pathogenic bacteria and their protein toxins play a crucial role in a proper response to natural or terrorist-caused outbreaks of infectious diseases. The recent availability of whole genome sequences of priority bacterial pathogens opens new diagnostic possibilities for identification of bacteria by retrieving their genomic or proteomic information. We describe a method for identification of bacteria based on tandem mass spectrometric (MS/MS) analysis of peptides derived from bacterial proteins. This method involves bacterial cell protein extraction, trypsin digestion, liquid chromatography MS/MS analysis of the resulting peptides, and a statistical scoring algorithm to rank MS/MS spectral matching results for bacterial identification. To facilitate spectral data searching, a proteome database was constructed by translating genomes of bacteria of interest with fully or partially determined sequences. In this work, a prototype database was constructed by the automated analysis of 87 publicly available, fully sequenced bacterial genomes with the GLIMMER gene finding software. MS/MS peptide spectral matching for peptide sequence assignment against this proteome database was done by SEQUEST. To gauge the relative significance of the SEQUEST-generated matching parameters for correct peptide assignment, discriminant function (DF) analysis of these parameters was applied and DF scores were used to calculate probabilities of correct MS/MS spectra assignment to peptide sequences in the database. The peptides with DF scores exceeding a threshold value determined by the probability of correct peptide assignment were accepted and matched to the bacterial proteomes represented in the database. Sequence filtering or removal of degenerate peptides matched with multiple bacteria was then performed to further improve identification. It is demonstrated that using a preset criterion with known distributions of discriminant function scores and probabilities of correct peptide sequence assignments, a test bacterium within the 87 database microorganisms can be unambiguously identified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号