首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
《Information Systems》2005,30(4):277-298
An important problem in unstructured peer-to-peer (P2P) networks is the efficient content-based retrieval of documents shared by other peers. However, existing searching mechanisms are not scaling well because they are either based on the idea of flooding the network with queries or because they require some form of global knowledge.We propose the Intelligent Search Mechanism (ISM) which is an efficient, scalable yet simple mechanism for improving the information retrieval problem in P2P systems. Our mechanism is efficient since it is bounded by the number of neighbors and scalable because no global knowledge is required to be maintained.ISM consists of four components: A Profiling Structure which logs queryhit messages coming from neighbors, a Query Similarity function which calculates the similarity queries to a new query, RelevanceRank which is an online neighbor ranking function and a Search Mechanism which forwards queries to selected neighbors.We deploy and compare ISM with a number of other distributed search techniques over static and dynamic environments. Our experiments are performed with real data over Peerware, our middleware simulation infrastructure which is deployed on 75 workstations. Our results indicate that ISM outperforms its competitors and that in some cases it manages to achieve 100% recall rate while using only half of the network resources required by its competitors. Further, its performance is also superior with respect to the total query response time and our algorithm exhibits a learning behavior as nodes acquire more knowledge. Finally ISM works well in dynamic network topologies and in environments with replicated data sources.  相似文献   

2.
一种AWGN信道下非规则LDPC码的优化方法   总被引:2,自引:1,他引:1  
提出了一种加性高斯噪声(AWGN)信道下非规则低密度奇偶校验(LDPC)码的优化方法。利用LDPC码迭代译码算法下变量节点与校验节点的EXIT曲线之间的面积反映码的收敛阈值与香农限的差距这一特点,设计表示两条EXIT曲线之间面积的函数F,该函数可以衡量非规则LDPC码的性能,F值越小,LDPC码的收敛阈值就越接近香农限。仿真显示,合理设计变量节点和校验节点,减小函数F的值,能够提高非规则LDPC码的性能。  相似文献   

3.
In this paper we show that statistical properties of the transition graph of a system to be verified can be exploited to improve memory or time performances of verification algorithms.We show experimentally that protocols exhibit transition locality. That is, with respect to levels of a breadth-first state space exploration, state transitions tend to be between states belonging to close levels of the transition graph. We support our claim by measuring transition locality for the set of protocols included in the Mur verifier distribution .We present a cache-based verification algorithm that exploits transition locality to decrease memory usage and a disk-based verification algorithm that exploits transition locality to decrease disk read accesses, thus reducing the time overhead due to disk usage. Both algorithms have been implemented within the Mur verifier.Our experimental results show that our cache-based algorithm can typically save more than 40% of memory with an average time penalty of about 50% when using (Mur) bit compression and 100% when using bit compression and hash compaction, whereas our disk-based verification algorithm is typically more than ten times faster than a previously proposed disk-based verification algorithm and, even when using 10% of the memory needed to complete verification, it is only between 40 and 530% (300% on average) slower than (RAM) Mur with enough memory to complete the verification task at hand. Using just 300 MB of memory our disk-based Mur was able to complete verification of a protocol with about 109 reachable states. This would require more than 5 GB of memory using standard Mur .  相似文献   

4.
Much research has focused on reducing and/or tolerating remote memory access latencies on distributed-memory parallel computers. Caching remote data is intended to reduce average access latency by handling as many remote memory accesses as possible using local copies of the data in the cache. Data-flow and multithreaded approaches help programs tolerate the latency of remote memory accesses by allowing processors to do other work while remote operations take place. The thread migration technique described here is a multithreaded architecture where threads migrate to remote processors that contain data they need. By exploiting access locality, the threads often use several data items from that processor before migrating to other processors for more data. Because the threads migrate in search of data, the approach is called Nomadic Threads. A prototype runtime system has been implemented on the CM5 and is portable to other distributed memory parallel computers.  相似文献   

5.
An irregular problem models the evolution of a system where several elements are irregularly distributed in a domain. The evolution modifies this distribution in a way that cannot be foreseen and the behavior of each element depends upon the elements close to it according to a problem dependent relation. Starting from a hierarchical representation of the domain, we define a parallelization methodology that includes a load balancing strategy that preserves this locality property and a strategy to collect information distributed onto the processing nodes.  相似文献   

6.
Yuhui Deng 《Information Sciences》2009,179(14):2494-2511
Due to the widening performance gap between RAM and disk drives, a large number of I/O optimization methods have been proposed and designed to alleviate the impact of this gap. One of the most effective approaches of improving disk access performance is enhancing data locality. This is because the method could increase the hit ratio of disk cache and reduce the seek time and rotational latency. Disk drives have experienced dramatic development since the first disk drive was announced in 1956. This paper investigates some important characteristics of modern disk drives. Based on the characteristics and the observation that data access on disk drives is highly skewed, the frequently accessed data blocks and the correlated data blocks are clustered into objects and moved to the outer zones of a modern disk drive. The idea attempts to enhance spatial locality, improve the efficiency of aggressive sequential prefetch, and take advantage of Zoned Bit Recording (ZBR). An experimental simulation is employed to investigate the performance gains generated by the enhanced data locality. The performance gains are analyzed by breaking down the disk access time into seek time, rotational latency, data transfer time, and hit ratio of the disk cache. Experimental results provide useful insights into the performance behaviours of a modern disk drive with enhanced data locality.  相似文献   

7.
Graphical processing units (GPUs) have recently attracted attention for scientific applications such as particle simulations. This is partially driven by low commodity pricing of GPUs but also by recent toolkit and library developments that make them more accessible to scientific programmers. We discuss the application of GPU programming to two significantly different paradigms—regular mesh field equations with unusual boundary conditions and graph analysis algorithms. The differing optimization techniques required for these two paradigms cover many of the challenges faced when developing GPU applications. We discuss the relevance of these application paradigms to simulation engines and games. GPUs were aimed primarily at the accelerated graphics market but since this is often closely coupled to advanced game products it is interesting to speculate about the future of fully integrated accelerator hardware for both visualization and simulation combined. As well as reporting the speed‐up performance on selected simulation paradigms, we discuss suitable data‐parallel algorithms and present code examples for exploiting GPU features like large numbers of threads and localized texture memory. We find a surprising variation in the performance that can be achieved on GPUs for our applications and discuss how these findings relate to past known effects in parallel computing such as memory speed‐related super‐linear speed up. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

8.
9.
低编码复杂度不规则准循环LDPC码的构造方法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对不规则低密度奇偶校验码(LDPC码)误码性能好,但编码复杂度高的问题,利用重复积累码(RA码)能有效编码的特性和掩模技术,提出了一种不规则LDPC码的构造方法,该码具有线性复杂度的编码算法。该构造方法,首先对RA码的校验矩阵进行了改进,消除了RA码常产生的错误平层效应;然后基于范德蒙矩阵构造了一种新的校验矩阵,该校验矩阵具有代数结构,易于硬件实现。理论分析和实验结果表明,构造的不规则LDPC码的编码复杂度低于Mackay随机码,在加性高斯白噪声(AWGN)信道条件下,误码率为1.0×10-4时,比Mackay随机码性能提高约0.4~0.6 dB。  相似文献   

10.
Temporal and spatial locality of the inputs, i.e., the property allowing a classifier to receive the same samples over time-or samples belonging to a neighborhood-with high probability, can be translated into the design of embedded classifiers. The outcome is a computational complexity and power aware design particularly suitable for implementation. A classifier based on the gated-parallel family has been found particularly suitable for exploiting locality properties: Subclassifiers are generally small, independent each other, and controlled by a master-enabling module granting that only a subclassifier is active at a time, the others being switched off. By exploiting locality properties we obtain classifiers with accuracy comparable with the ones designed without integrating locality but gaining a significant reduction in computational complexity and power consumption.  相似文献   

11.
ABSTRACT

Algorithmic differentiation tools can automate the adjoint transformation of parallel message-passing codes [J. Utke, L. Hascoët, P. Heimbach, C. Hill, P. Hovland, and U. Naumann, Toward Adjoinable MPI, in 2009 IEEE International Symposium on Parallel & Distributed Processing, May, IEEE, 2009, pp. 1–8] using the AMPI library. Nevertheless, a non-trivial and manual step after the differentiation is the initialization of the seed and retrieval of the output values from the differentiated code. Ambiguities in seeding occur in programs where the user is unable to expose the complete program flow with a single entry and single exit point to the AD tool. We present the ambiguities associated with seed initialization and output retrieval for adjoint transformation of halo and zero-halo partitioned MPI programs. We introduce a general framework to eliminate ambiguities in seeding and retrieval for shared-node reduction over +, and * operators using a conceptual master-worker model. The model shows the need for new MPI calls for retrieval and eliminate MPI calls for seed initialization. Different implementations for seeding manually assembled adjoints were inferred from the model, namely, partial and unique seeding. We successfully applied the seeding techniques to a 3D zero-halo partitioned unstructured compressible discrete adjoint solver and highlight the merits and demerits of each strategy.  相似文献   

12.
When parallelizing irregular applications on ccNUMA machines several issues should be taken into account in order to achieve high code performance. These factors include locality exploitation and parallelism, as well as careful use of memory resources (memory overhead). An important number of numerical simulation codes are clear examples of irregular applications. Frequently these kinds of codes include reduction operations in their core, so that an important fraction of the computational time is spent on such operations. Specifically, cloth simulation belongs to this class of applications, being a topic of increasing interest in diverse areas, like in the multimedia industry. Moreover, when real time simulation is the aim, its parallelization becomes an important option.This paper discusses and compares different irregular reduction parallelization techniques on ccNUMA share memory machines. Broadly speaking, we may classify them into two groups: privatization-based and data partitioning-based methods. In this paper we describe a framework, based on data affinity, that permits to develop various algorithms inside the group of the data partitioning-based techniques. All these techniques and approaches are analyzed and adapted to the computational structure of a real, physically based, cloth simulator.  相似文献   

13.
Scientific computation has unavoidable approximations built into its very fabric. One important source of error that is difficult to detect and control is round-off error propagation which originates from the use of finite precision arithmetic. We propose that there is a need to perform regular numerical ‘health checks’ on scientific codes in order to detect the cancerous effect of round-off error propagation. This is particularly important in scientific codes that are built on legacy software. We advocate the use of the CADNA library as a suitable numerical screening tool. We present a case study to illustrate the practical use of CADNA in scientific codes that are of interest to the Computer Physics Communications readership. In doing so we hope to stimulate a greater awareness of round-off error propagation and present a practical means by which it can be analyzed and managed.  相似文献   

14.
This paper approaches the memory bottleneck problem in FPGA-accelerated codes processing unstructured meshes. A methodology to reduce the required memory bandwidth is presented and evaluated, based on the combined application of data sorting, coding and compression techniques. Sorting allows efficient streaming between the memory and the FPGA, improving data locality and avoiding redundant data requests. Coding achieves a compact representation of the mesh connectivity. Differential compression reduces the size of the mesh data. We propose a hardware implementation with low resource requirements, tailored to accelerators based on reconfigurable devices. The combination of techniques reduces the memory traffic of two computational problems down to an average 34% and 19% of their original sizes, respectively.  相似文献   

15.
Silent Data Corruptions (SDCs), those errors that escape detection methods, are critical for system designers because they may result in systems failures. In order to catch SDCs, mechanisms should focus on the behavioural aspects of errors in addition to their physical location or error patterns. Therefore, protection codes like parity, hamming, and the Reed-Solomon code, which heavily depend on the physical location of data bits, are not enough in processors for detection of computing errors. Using characterizing data behaviour during program executions, we have observed value locality in results of destination register for each instruction binary code (instruction opcode and operand codes). This locality exists not only in the results of each instruction, but also in the results of instructions at different memory locations having the same binary code. As a result, an architecture called Instruction Result History Table (IRHT) is presented, which is indexed by the instruction binary code. In the IRHT, a history of values produced by the same instruction binary codes are stored in and utilized during each instruction execution cycle. Any mismatch between the stored values in the IRHT and those generated by current execution, indicates an SDC syndrome. To confirm having SDCs with a high level of confidence, a second execution of the current instruction is issued. A duplication of the execution confirms whether SDC occurred. In the case of SDCs, a third instruction execution with the help of a majority voting frees the system of SDC. Several extensive simulations showed that, up to 83.54% of SDCs are detectable with the help of this locality. Moreover, with the small hardware, IRHT, i.e., 16 kB size, 80.66% of SDCs can be detected on average. Note that the presented method can detect those errors that escape conventional detection mechanisms. So, it can be utilized in conjunction with other conventional methods.  相似文献   

16.
In this work, we show that the submachine locality exposed by hierarchical bulk-synchronous computations can be efficiently turned into locality of reference on arbitrarily deep hierarchies. Specifically, we develop efficient schemes to simulate parallel programs written for the Decomposable BSP (a BSP variant which features a hierarchical decomposition into submachines) on the sequential Hierarchical Memory Model (HMM), which rewards the exploitation of temporal locality, and on its extension with block transfer, the BT model, which also rewards the exploitation of spatial locality. The simulations yield good hierarchy-conscious sequential algorithms from parallel ones, and provide evidence of the strict relation between submachine locality in parallel computation and locality of reference in the hierarchical memory setting. We also devise a generalization of the HMM result to the self-simulation of D-BSP augmented with hierarchical memory modules, which yields an interesting analog of Brent's lemma, thus proving that the enhanced model features a seamless integration of memory and network hierarchies.  相似文献   

17.
Multithreaded programs often exhibit erroneous behavior because of unintended interactions between concurrent threads. This paper focuses on the noninterference property of atomicity. A procedure is atomic if, for every execution, there is an equivalent serial execution in which the actions of the atomic procedure are not interleaved with actions of other threads. This key property makes atomic procedures amenable to sequential reasoning techniques, which significantly facilitates subsequent validation activities such as code inspection and testing. Several existing tools verify atomicity by using commutativity of actions to show that every execution reduces to a corresponding serial execution. However, experiments with these tools have highlighted a number of interesting procedures that, while intuitively atomic, are not reducible. In this paper, we exploit the notion of pure code blocks to verify the atomicity of such irreducible procedures. If a pure block terminates normally, then its evaluation does not change the program state and, hence, these evaluation steps can be removed from the program trace before reduction. We develop a static typed-based analysis for atomicity based on this insight, and we illustrate this analysis on a number of interesting examples that could not be verified using earlier tools based purely on reduction.  相似文献   

18.
In face recognition, when the number of images in the training set is much smaller than the number of pixels in each image, Locality Preserving Projections (LPP) often suffers from the singularity problem. To overcome singularity problem, principal component analysis is applied as a preprocessing step. But this procession may discard some important discriminative information. In this paper, a novel algorithm called Optimal Locality Preserving Projections (O-LPP) is proposed. The algorithm transforms the singular eigensystem computation to eigenvalue decomposition problems without losing any discriminative information, which can reduce the computation complexity. And the theoretical analysis related to the algorithm is also obtained. Extensive experiments on face databases demonstrate the proposed algorithm is superior to the traditional LPP algorithm.  相似文献   

19.
研究表明基于整体思想的人脸识别方法由于忽略图像的局部信息,在识别性能方面不如局部信息特征保持较好的基于子模块思想的识别算法。基于应用流形技术对图像降维后能够较好保持非线性子流形中的局部数据流形结构,提出了一种改进的子模式局部保持映射人脸识别算法。其主要思想是将同类的不同图像一并划分子集,由同位置子图组成子模块,并对子模块运用LPP算法学习其流形结构,与将不同类图像一并划分子集学习流形的方法不同。实验表明,该算法能更好地保持人脸图像的局部流形结构和信息特征,提高了识别率。  相似文献   

20.
Dynamic data, cache, and memory adaptation can significantly improve program performance when they are applied on long continuous phases of execution that have dynamic but predictable locality. To support phase-based adaptation, this paper defines the concept of locality phases and describes a four-component analysis technique. Locality-based phase detection uses locality analysis and signal processing techniques to identify phases from the data access trace of a program; frequency-based phase marking inserts code markers that mark phases in all executions of the program; phase hierarchy construction identifies the structure of multiple phases; and phase-sequence prediction predicts the phase sequence from program input parameters. The paper shows the accuracy and the granularity of phase and phase-sequence prediction as well as its uses in dynamic data packing, memory remapping, and cache resizing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号