首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Iterated sequence databank search methods were assessed from the viewpoint of someone with the sequence of a novel gene product wishing to find distant relatives to their protein and, with the specific searches against the PDB, also hoping to find a relative of known structure. We examined three methods in detail, spanning a range from simple pattern-matching to sophisticated weighted profiles. Rather than apply these methods 'blindly' (with default parameters) to a large number of test queries, we have concentrated on the globins, so allowing a more detailed investigation of each method on different data subsets with different parameter settings. Despite their widespread use, regular-expression matching proved to be very limited-seldom extending beyond the sub-family from which the pattern was derived. To attain any generality, the patterns had to be 'stripped-down' to include only the most highly conserved parts. The QUEST program avoided these problems by introducing a more flexible (weighted) matching. On the PDB sequences this was highly effective, missing only a few globins with probes based on each sub-family or even a single representative from each sub-family. In addition, very few false-positives were encountered, and those that did match, often only did so for a few cycles before being lost again. On the larger sequence collection, however, QUEST encountered problems with maintaining (or achieving) the alignment of the full globin family. psi-BLAST also recognised almost all the globins when matching against the PDB sequences, typically, missing three or four of the most distantly related sequences while picking-up a few false-positives. In contrast to QUEST, psi-BLAST performed very well on the larger databank, getting almost a full collection of globins although still retaining the same proportion of false-positives. SAM applied to the PDB sequences performed reasonably well with the myoglobin and hemoglobin families as probes, missing, typically several of the more difficult proteins but performed poorly with the leghemoglobin probe. Only with the full family range as a probe did it produce results comparable to psi-BLAST and QUEST. With the larger databank, SAM produced a good result but, again, this was only achieved using the full range of sequence variation with the default regulariser and use of Dirichlet mixtures completely failed in this situation.  相似文献   

2.
A platform for biological sequence comparison on parallel computers   总被引:1,自引:0,他引:1  
We have written two programs for searching biological sequence databases that run on Intel hypercube computers. PSCANLIB compares a single sequence against a sequence library, and PCOMPLIB compares all the entries in one sequence library against a second library. The programs provide a general framework for similarity searching; they include functions for reading in query sequences, search parameters and library entries, and reporting the results of a search. We have isolated the code for the specific function that calculates the similarity score between the query and library sequence; alternative searching algorithms can be implemented by editing two files. We have implemented the rapid FASTA sequence comparison algorithm and the more rigorous Smith-Waterman algorithm within this framework. The PSCANLIB program on a 16 node iPSC/2 80386-based hypercube can compare a 229 amino acid protein sequence with a 3.4 million residue sequence library in approximately 16 s with the FASTA algorithm. Using the Smith-Waterman algorithm, the same search takes 35 min. The PCOMPLIB program can compare a 0.8 million amino acid protein sequence library with itself in 5.3 min with FASTA on a third-generation 32 node Intel iPSC/860 hypercube.  相似文献   

3.
A method is developed, based on word-searching, which provides a rapid test for the statistical significance of DNA sequence similarities for use in databank searching. The method makes allowance for the lengths and dinucleotide compositions of the sequences being compared. A way is also described to calculate the power of the test, i.e. the probability of detecting a given similarity as being statistically significant. The effects on the power of the test of the scoring method, word length, sequence length, and sequence composition are examined. A novel scoring method is shown to be superior to the method currently used in most word-searching algorithms.  相似文献   

4.
In this paper, it is shown how to adapt an existing package (VODE) for solving systems of ordinary differential equations on serial computers to distributed memory parallel computers. The approach taken is based on waveform relaxation in which the problem is decomposed into a sequence of subproblems which are then solved independently using VODE on each processor. Communication between subtasks is provided by a generic software environment p4. This approach allows the development of general purpose parallel software for ODEs which is both reliable and portable.  相似文献   

5.
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all n computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down, the load on these computers must be redistributed to other computers in the system. The redistribution is determined by the recovery scheme. The recovery scheme is governed by a sequence of integers modulo n. Each sequence guarantees minimal load on the computer that has maximal load even when the most unfavorable combinations of computers go down. We calculate the best possible such recovery schemes for any number of crashed computers by an exhaustive search, where brute force testing is avoided by a mathematical reformulation of the problem and a branch-and-bound algorithm. The search nevertheless has a high complexity. Optimal sequences, and thus a corresponding optimal bound, are presented for a maximum of twenty one computers in the distributed system or cluster.Received: 26 May 2004, Published online: 14 March 2005  相似文献   

6.
Abstract This paper reports on the graphing work of children, aged 8 and 9 years, who have immediate and continuous access to portable computers across the whole curriculum. They have been using their computers to generate graphs and charts from experimental data. The unit of analysis is a learning sequence in which the progress of a small group of children on a specific coherent task was recorded over a period of several weeks. The paper describes two such learning sequences to illustrate two types of graphing, which can occur in computer-rich environments. In one sequence, the children collected data after which they explored the graphing facilities on the computer whereas in the other learning sequence graphing is used iteratively as an integral part of the ongoing task.  相似文献   

7.
李新国 《现代计算机》2003,(12):37-39,47
本文重点分析了TCP连接的建立过程,研究了TCP序列号猜测攻击的实现方法,并提出了应当采取什么措施来防止黑客利用该方法对我们进行攻击。  相似文献   

8.
《Computers & chemistry》1993,17(2):117-122
The problem of protein tertiary structure prediction from sequence is reviewed, emphasizing that practical solutions are most likely to come from the recognition of existing (known) structures that fit the sequence of the protein of unknown structure. Fit can be defined in terms of sequence alone—by simple alignment in the more obvious problems or pattern matching where the similarity is remote and fragmentary. More remote similarities can be recognized by matching the sequence directly onto a known structure. This threading method is outlined and it is proposed how it might be used to scan collections of idolized structures (possible folds) to avoid the limited sample of structures available it the current structural databank.  相似文献   

9.
An algorithm to simulate DNA sequence evolution under a general stochastic model, including as particular cases all the previously used schemes of nucleotide substitution, is described. The stimulation is carried out on finite, variable length, DNA sequences through a strict stochastic process, according to the particular substitution rates imposed by each scheme. Five FORTRAN programs, running on an IBM PC and compatibles, carry out all the tasks needed for the simulation. They are menu driven and interfaced to the system through a principal menu. All sequence data files used and generated by the SDSE package conform to the standard GenBank database format, thus allowing the use of any sequence retrieved from this databank, as well as the application of other packages to analyse, manipulate or retrieve stimulated sequences.  相似文献   

10.
An interface program has been developed for users of MS-DOS computers and the GenBank(R) gene sequence files in their diskette format. With the program a user is able to produce keyword, author and entry name listings of GenBank items or to select GenBank sequences for viewing, printing or decoding. The decode option uncompresses sequence data and yields a character file which has the format used on GenBank magnetic tapes. Program options are chosen by selecting items from command menus. While the program is designed primarily for hard disk operation, it also allows users of diskette-based computers to work with GenBank files.  相似文献   

11.
A thermochemical assessment was performed for the system K2O–Na2O–SiO2. The modified associate species model was applied to the ternary liquid in the system. All binary subsystems remained unchanged. The new databank was used for the representation of the phase equilibria in the ternary system including the quasi-binary sections of the ternary diagram. The calculated phase relations are in good agreement with the experimental data. The phase equilibria in the experimentally uninvestigated region near the alkali oxide edge are proposed as extrapolations using the new databank.  相似文献   

12.
13.
We present a new parallel algorithm for computing a maximum cardinality matching in a bipartite graph suitable for distributed memory computers.The presented algorithm is based on the Push-Relabel algorithm which is known to be one of the fastest algorithms for the bipartite matching problem. Previous attempts at developing parallel implementations of it have focused on shared memory computers using only a limited number of processors.We first present a straightforward adaptation of these shared memory algorithms to distributed memory computers. However, this is not a viable approach as it requires too much communication. We then develop our new algorithm by modifying the previous approach through a sequence of steps with the main goal being to reduce the amount of communication and to increase load balance. The first goal is achieved by changing the algorithm so that many push and relabel operations can be performed locally between communication rounds and also by selecting augmenting paths that cross processor boundaries infrequently. To achieve good load balance, we limit the speed at which global relabelings traverse the graph. In several experiments on a large number of instances, we study weak and strong scalability of our algorithm using up to 128 processors.The algorithm can also be used to find ?-approximate matchings quickly.  相似文献   

14.
Abstract

The on-Line Earthnet Data Availability (LEDA) databank is a catalogue of imagery recorded by the sensors onboard the Landsat and TIROS (NOAA 9 and 10) satellites. Traditionally, online catalogues of this type have been searched by information specialists on behalf of the data user, a situation which can lead to delays. The authors feel that the LEDA databank combines the advantages of being both relatively inexpensive and easy to search. Thus the data user can search the databank.  相似文献   

15.
Computer benchmarking is a common method for measuring the parameters of a computational model. It helps to measure the parameters of any computer. With the emergence of multicore computers, the evaluation of computers was brought under consideration. Since these types of computers can be viewed and considered as parallel computers, the evaluation methods for parallel computers may be appropriate for multicore computers. However, because multicore architectures seriously focus on cache hierarchy, there is a need for new and different benchmarks to evaluate them correctly. To this end, this paper presents a method for measuring the parameters of one of the most famous multicore computational models, namely Multi-Bulk Synchronous Parallel (Multi-BSP). This method measures the hardware latency parameters of multicore computers, namely communication latency (g i ) and synchronization latency (L i ) for all levels of the cache memory hierarchy in a bottom-up manner. By determining the parameters, the performance of algorithms on multicore architectures can be evaluated as a sequence.  相似文献   

16.
The core of a 6502 machine language program for DNA sequence analysis on Apple II microcomputer is described. Use of a binary coding of nucleotides allows interactive data manipulation on a low-cost configuration with execution times similar to those of larger computers. The PEGASE system should prove useful and easy to use in routine sequence handling and experiment design.  相似文献   

17.
We have improved an existing clone database management system written in FORTRAN 77 and adapted it to our software environment. Improvements are that the database can be interrogated for any type of information, not just keywords. Also, recombinant DNA constructions can be represented in a simplified 'shorthand', whereafter a program assembles the full nucleotide sequence from the contributing fragments, which may be obtained from nucleotide sequence databases. Another improvement is the replacement of the database manager by programs, running in batch to maintain the databank and verify its consistency automatically. Finally, graphic extensions are written in Graphical Kernel System, to draw linear and circular restriction maps of recombinants. Besides restriction sites, recombinant features can be presented from the feature lines of recombinant database entries, or from the feature tables of nucleotide databases. The clone database management system is fully integrated into the sequence analysis software package from the Pasteur Institute, Paris, and is made accessible through the same menu. As a result, recombinant DNA sequences can directly be analysed by the sequence analysis programs.  相似文献   

18.
Probabilistic models of floating point and logarithmic arithmetic are constructed using assumptions with both theoretical and empirical justification. The justification of these assumptions resolves open questions in Hamming (1970) and Bustoz et al. (1979). These models are applied to errors from sums and inner products. A comparison is made between the error analysis properties of floating point and logarithmic computers. We conclude that the logarithmic computer has smaller error confidence intervals for roundoff errors than a floating point computer with the same computer word size and approximately the same number range.  相似文献   

19.
A formalized procedure is proposed for determining the input/out sequence and the stepping of variables in systolic or semisystolic computers. The procedure is applicable to the design of systolic arrays, as well as to existing computers.Translated from Kibernetika, No. 1, pp. 51–57, January–February, 1991.  相似文献   

20.
We have implemented a parallel version of a dynamic programming biological sequence comparison algorithm to study the potential applicability of using parallel computers for genetic sequence comparisons. Our parallel program is built using C-Linda, a machine-independent parallel programming language, and was tested on both a 10 CPU Sequent Symmetry and a 64 CPU Intel Hypercube. C-Linda implements a shared associative memory model, "tuple space," through which multiple processes can communicate and coordinate control. In our master-worker (MW) parallel implementation, a master process creates several worker processes, extracts a test sequence and multiple library sequences from a database and stores them in tuple space. Each worker reads the test sequence and then repeatedly extracts library strings from tuple space, performs pairwise sequence comparison using a local comparison algorithm to generate a similarity score, and returns the similarity scores to tuple space. The master collects the scores from tuple space and identifies the best match over all library sequences. We also implemented a method of global interworker communication to reduce the total search time by stopping those string comparisons that had no chance of improving on the current best match. Comparisons of the total run time, speedup, and efficiency were made for parallel and sequential versions of a basic MW implementation as well as versions with the global abort threshold.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号