期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李冬辉叶利涛《计算机应用》2004,24(12):99-101

针对一些并发控制协议中由于过多的事务重启动造成资源浪费，提出了一种新的乐观并发控制协议(Optimistic Concurrency Control)，通过向后调整不严重冲突事务的动态串行化顺序，许多不必要的事务重启动可以避免。在一个事务的读阶段不用记录事务冲突和串行化限制，各种优先级冲突解决方法可以很方便地加入到该协议中，根据需要选择使用了优先级-中止-50机制。相似文献

2.

Efficient execution of speculative threads and transactions with hardware transactional memory

《Future Generation Computer Systems》2014

Thread-level speculation (TLS) was researched to automatically parallelize portions of serial programs for execution, and transactional memory (TM) was studied as a promising alternative of lock for parallel programming due to its simplicity. Both TLS and TM require similar underlying support. In the paper, we present SeTM (sequential transactional memory), a hardware enhanced TM system which supports TLS at minor extra cost. Signature is an effective way to buffer speculative states in TM and TLS. But it cripples TM and TLS performance due to its false-positive in terms of conflict detection, especially for conflict-intensive TLS. SeTM adopts R/W bits and signature concurrently to ameliorate this bad influence. Additionally, SeTM introduces the fast rollback mechanism, which provides fast abort recovery for eager log-based HTM and TLS. The most important contribution of SeTM is the conflict-tolerant mechanism, which tolerates some ambiguous data conflicts in TLS. Finally, in order to achieve an efficient execution for these un-order transactions, we add an extra ordering mechanism for SeTM. With this ordering mechanism, the transactions in TM can also gain the performance improvement with the support of conflict-tolerant mechanism. Our evaluation major on TM and TLS separately. For the TLS applications, six representative benchmarks have been adopted to evaluate the above model. Our experimental results show that our scheme improves the execution performance of most tested codes at a modest hardware cost. For a set of important scientific loops, we report the highest speedup of 6.5 with 15 cores. Besides, experimental results also show good scalability of SeTM system. For the TM applications, with respect to LogTM-SE, the benchmarks from STAMP also gain performance improvement signally. 相似文献

3.

Self-tuning Intel Restricted Transactional Memory

《Parallel Computing》2015

The Transactional Memory (TM) paradigm aims at simplifying the development of concurrent applications by means of the familiar abstraction of atomic transaction. After a decade of intense research, hardware implementations of TM have recently entered the domain of mainstream computing thanks to Intel’s decision to integrate TM support, codenamed RTM (Reduced Transactional Memory), in their last generation of processors.In this work we shed light on a relevant issue with great impact on the performance of Intel’s RTM: the correct tuning of the logic that regulates how to cope with failed hardware transactions. We show that the optimal tuning of this policy is strongly workload dependent, and that the relative difference in performance among the various possible configurations can be remarkable (up to 10 × slow-downs).We address this issue by introducing a simple and effective approach that aims to identify the optimal RTM configuration at run-time via lightweight reinforcement learning techniques. The proposed technique requires no off-line sampling of the application, and can be applied to optimize both the cases in which a single global lock or a software TM implementation is used as fall-back synchronization mechanism.We propose and evaluate different designs for the proposed self-tuning mechanisms, which we integrated with GCC in order to achieve full transparency for the programmers. Our experimental study, based on standard TM benchmarks, demonstrates average gains of 60% over any static approach while remaining within 5% from the performance of manually identified optimal configurations. 相似文献

4.

Optimised memory allocation for less false abortion and better performance in hardware transactional memory

Xiuhong Li Altenbek Gulila 《International Journal of Parallel, Emergent and Distributed Systems》2020,35(4):483-491

ABSTRACT

This paper introduces and tackles a special performance hazard in Hardware Transactional Memory (HTM): false abortion. False abortion causes many unnecessary transaction abortions in HTM and can greatly impact the performance, making HTM not that useful when it is adopted as a fast path for Software Transactional Memory. By introducing a new memory allocator design, we are able to put objects that are likely to be accessed together from different threads into different cache lines and thus avoid conflicts of hardware transactions in different threads. Experiments show that our method can reduce 47% of transaction abortion and achieve a speedup of up to 1.67× (averagely 22%), yet only consume 14% more memory, showing great potential to enhance current HTM technology. 相似文献

5.

Optimistic priority-based concurrency control protocol for firm real-time database systems

Jinhwan Kim

Heonshik Shin 《Information and Software Technology》1994,36(12):707-715

This paper presents an optimistic priority-based concurrency control protocol that schedules active transactions accessing firm deadline real-time database systems. This protocol combines the forward and backward validation processes in order to control concurrent transactions with different priorities more effectively. For a transaction in the validation phase, it can be committed successfully if the serialization order is adjusted in favour of the transactions with higher priority and aborted otherwise. Thus, this protocol establishes a priority ordering technique whereby a serialization order is selected and transaction execution is forced to obey this order. This priority-based protocol addresses the problem of satisfying data consistency, with the goal being to increase the number of transactions that commit by their deadlines. In addition, for desirable real-time conflict resolution, this protocol intends to meet more deadlines of higher priority transactions then lower priority transactions. 相似文献

6.

Nebelung: Execution Environment for Transactional OpenMP

Miloš Milovanović Roger Ferrer Vladimir Gajinov Osman S. Unsal Adrian Cristal Eduard Ayguadé Mateo Valero 《International journal of parallel programming》2008,36(3):326-346

Future generations of Chip Multiprocessors (CMP) will provide dozens or even hundreds of cores inside the chip. Writing applications that benefit from the massive computational power offered by these chips is not going to be an easy task for mainstream programmers who are used to sequential algorithms rather than parallel ones. This paper explores the possibility of using Transactional Memory (TM) in OpenMP, the industrial standard for writing parallel programs on shared-memory architectures, for C, C++ and Fortran. One of the major complexities in writing OpenMP applications is the use of critical regions (locks), atomic regions and barriers to synchronize the execution of parallel activities in threads. TM has been proposed as a mechanism that abstracts some of the complexities associated with concurrent access to shared data while enabling scalable performance. The paper presents a first proof-of-concept implementation of OpenMP with TM. Some language extensions to OpenMP are proposed to express transactions. These extensions are implemented in our source-to-source OpenMP Mercurium compiler and our Software Transactional Memory (STM) runtime system Nebelung that supports the code generated by Mercurium. Hardware Transactional Memory (HTM) or Hardware-assisted STM (HaSTM) are seen as possible paths to make the tandem TM-OpenMP more scalable. In the evaluation section we show the preliminary results. The paper finishes with a set of open issues that still need to be addressed, either in OpenMP or in the hardware/software implementations of TM. 相似文献

7.

Formal Reasoning About Lazy-STM Programs

下载免费PDF全文

Yong Li Yu Zhang Yi-Yun Chen Ming Fu 《计算机科学技术学报》2010,25(4):841-852

Transactional memory (TM) is an easy-using parallel programming model that avoids common problems associated with conventional locking techniques. Several researchers have proposed a large amount of alternative hardware and software TM implementations. However, few ones focus on formal reasoning about these TM programs. In this paper, we propose a framework at assembly level for reasoning about lazy software transactional memory (STM) programs. First, we give a software TM implementation based on lightweight locks. These locks are also one part of the shared memory. Then we define the semantics of the model operationally, and the lightweight locks in transaction are non-blocking, avoiding deadlocks among transactions. Finally we design a logic — a combination of permission accounting in separation logic and concurrent separation logic — to verify various properties of concurrent programs based on this machine model. The whole framework is formalized using a proof-carrying-code (PCC) framework. 相似文献

8.

Application-oriented ping-pong benchmarking: how to assess the real communication overheads

Timo Schneider Robert Gerstenberger Torsten Hoefler 《Computing》2014,96(4):279-292

Moving data between processes has often been discussed as one of the major bottlenecks in parallel computing—there is a large body of research, striving to improve communication latency and bandwidth on different networks, measured with ping-pong benchmarks of different message sizes. In practice, the data to be communicated generally originates from application data structures and needs to be serialized before communicating it over serial network channels. This serialization is often done by explicitly copying the data to communication buffers. The message passing interface (MPI) standard defines derived datatypes to allow zero-copy formulations of non-contiguous data access patterns. However, many applications still choose to implement manual pack/unpack loops, partly because they are more efficient than some MPI implementations. MPI implementers on the other hand do not have good benchmarks that represent important application access patterns. We demonstrate that the data serialization can consume up to 80 % of the total communication overhead for important applications. This indicates that most of the current research on optimizing serial network transfer times may be targeted at the smaller fraction of the communication overhead. To support the scientific community, we extracted the send/recv-buffer access patterns of a representative set of scientific applications to build a benchmark that includes serialization and communication of application data and thus reflects all communication overheads. This can be used like traditional ping-pong benchmarks to determine the holistic communication latency and bandwidth as observed by an application. It supports serialization loops in C and Fortran as well as MPI datatypes for representative application access patterns. Our benchmark, consisting of seven micro-applications, unveils significant performance discrepancies between the MPI datatype implementations of state of the art MPI implementations. Our micro-applications aim to provide a standard benchmark for MPI datatype implementations to guide optimizations similarly to the established benchmarks SPEC CPU and Livermore Loops. 相似文献

9.

Profiling and Optimizing Transactional Memory Applications

Ferad Zyulkyarov Srdjan Stipic Tim Harris Osman S. Unsal Adrián Cristal Ibrahim Hur Mateo Valero 《International journal of parallel programming》2012,40(1):25-56

Many researchers have developed applications using transactional memory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose tools for profiling and optimizing programs which use transactions. In this paper we introduce a series of profiling and optimization techniques for TM applications. The profiling techniques are of three types: (i) techniques to identify multiple potential conflicts from a single program run, (ii) techniques to identify the data structures involved in conflicts by using a symbolic path through the heap, rather than a machine address, and (iii) visualization techniques to summarize how threads spend their time and which of their transactions conflict most frequently. Altogether they provide in-depth and comprehensive information about the wasted work caused by aborting transactions. To reduce the contention between transactions we suggest several TM specific optimizations which leverage nested transactions, transaction checkpoints, early release and etc. To examine the effectiveness of the profiling and optimization techniques, we provide a series of illustrations from the STAMP TM benchmark suite and from the synthetic WormBench workload. First we analyze the performance of TM applications using our profiling techniques and then we apply various optimizations to improve the performance of the Bayes, Labyrinth and Intruder applications. We discuss the design and implementation of the profiling techniques in the Bartok-STM system. We process data offline or during garbage collection, where possible, in order to minimize the probe effect introduced by profiling. 相似文献

10.

基于NVM和HTM的低时延事务处理

魏星达陆放明陈榕陈海波臧斌宇《软件学报》2022,33(3):849-866

硬件事务内存(hardware transactional memory,HTM)能够极大地提升多核内存事务处理的吞吐.然而,为了避免慢速持久化设备对事务吞吐的影响,现有系统以批量的方式提交事务,这使得事务提交有极高的延迟.低时延非易失性内存(non-volatile memory,NVM)的出现,给降低基于HTM的内... 相似文献

11.

MDBS中一种改进的并发事务调度算法

王元珍龚卫华《计算机工程与应用》2005,41(31):11-13,40

MDBS中并发事务的调度策略必须满足可串行化准则,论文主要分析以事务提交图为中心的调度算法TM2,虽然保证了全局事务提交顺序的可串行化,但在提交时才进行冲突检测方式存在缺点,提出改进后的事务调度算法TM3不仅保证了全局事务的可串行化和防止了全局死锁的发生,还提高了全局事务执行的并发度。最后通过实验在数据库加速引擎中进行模拟,对比了两种调度算法的性能。相似文献

12.

On rigorous transaction scheduling

Breitbart Y. Georgakopoulos D. Rusinkiewicz M. Silberschatz A. 《IEEE transactions on pattern analysis and machine intelligence》1991,17(9):954-960

The class of transaction scheduling mechanisms in which the transaction serialization order can be determined by controlling their commitment order, is defined. This class of transaction management mechanisms is important, because it simplifies transaction management in a multidatabase system environment. The notion of analogous execution and serialization orders of transactions is defined and the concept of strongly recoverable and rigorous execution schedules is introduced. It is then proven that rigorous schedulers always produce analogous execution and serialization orders. It is shown that the systems using the rigorous scheduling can be naturally incorporated in hierarchical transaction management mechanisms. It is proven that several previously proposed multidatabase transaction management mechanisms guarantee global serializability only if all participating databases systems produce rigorous schedules 相似文献

13.

基于对应比较图的Fabric排序机制优化

刘润德陈志德《计算机系统应用》2023,32(5):323-329

针对HLF (Hyperledger Fabric)区块链系统在排序阶段中存在的缺陷,提出了一种基于对应比较图的图排序优化方案.利用对应比较图具有相关不变性质的图合并过程以及其算法运行时间短的特点,设计了一种基于交易重要度的拓扑算法,旨在减少由于默认的顺序排序而导致的序列化冲突问题.通过实验与分析,表明该方案有效解决了原始方案的序列化冲突问题,减少了系统中无效事务的比例,提升了系统交易效率,节省了大量的计算与存储资源. 相似文献

14.

Performance analysis of dynamic locking with the no-waiting policy

Ryu I.K. Thomasian A. 《IEEE transactions on pattern analysis and machine intelligence》1990,16(7):684-698

A transaction processing system with two-phase dynamic locking with the no waiting policy (DLNW) for concurrency control is considered. In this method, transactions making conflicting lock requests are aborted and restarted rather than blocked, thereby eliminating blocking delays (and deadlocks), but making it susceptible to cyclic restarts. Cyclic restarts are dealt with by delaying the restart of a transaction encountering a lock conflict or replacing it with a new transaction. Analytic solution methods for evaluating the performance of the variants of the DLNW method are described. The analytic methods, validated against simulation and shown to be acceptably accurate, are used to study the effect of the following parameters on system performance: transaction size and its distribution, degree of concurrency, the throughput characteristic of the computer system, and the mixture of read-only query and update transactions. A comparison of the DLNW and dynamic locking with waiting (DLW) methods shows that DLW provides higher throughput than DLNW, except when there is no hardware resource contention and conflicted transactions can be replaced by new transactions. The DLNW method outperforms the time-stamp ordering method, as observed from simulation results as well as case by case analyses of possible scenarios 相似文献

15.

基于冲突相关性检测的竞争管理模型

初才俊胡大裟蒋玉明《计算机应用》2013,33(7):2051-2054

在无干扰特性下的软件事务存储系统中,竞争管理策略直接应用于冲突事务的消解,对具有整个系统的性能有直接的影响。针对现有竞争管理决策方式相对单一而产生的性能不稳定问题,提出了基于冲突相关性检测的竞争管理模型。该方法可以从过去的仲裁记录中分析冲突事务中存在的关联性,并把检测到的关联性作为当前冲突的决策依据,从而得到较优的冲突处理结果。在仿真平台采用该方法对部分基准数据结构的测试数据表明,该方法检测到并且帮助提交的冲突关联事务最多可占系统吞吐量的30%,其事务吞吐总量比其他参照对象的平均值高出约11%,具有较好的灵活度和适用性。相似文献

16.

移动广播环境中实时事务处理性能研究

雷向东赵跃龙陈松乔《计算机工程与应用》2007,43(13):14-17

提出了移动广播环境中MVOCC-DA-2PV(Multiversion Optimistic Concurrency Control with Dynamic Adjustment of serialization order using Two-Phase Validation)并发控制协议。移动实时事务处理分两阶段进行。第一阶段在移动主机(MobileHosts,MHs)上处理,第二阶段在服务器上处理。移动主机(MHs)上所有移动事务执行部分向后有效性确认,与在服务器提交事务进行有效性确认。如果移动事务通过MH上部分有效性确认,提交到服务器进行最终有效性确认。如此早地检测数据冲突,节省了处理和通信资源。协议消除了移动只读事务和移动更新事务的冲突,使用多版本动态调整串行次序技术,避免了不必要的事务重启动。降低了移动只读事务的响应时间。通过模拟仿真,对MVOCC-DA-2PV协议进行了性能测试,并与PVTO和HP2PL进行了比较。实验结果表明MVOCC-DA-2PV并发控制协议要优于其它协议。相似文献

17.

分布式实时事务提交协议 总被引：2，自引：1，他引：2

刘云生覃飙《计算机研究与发展》2002,39(7):827-832

在分布式实时数据库系统中，保证事务原子性的唯一途径是研究和开发出一个实时的原子提交协议．首先详细分析了事务因数据访问冲突而形成的各种依赖关系，在此基础上提出了实时的原子乐观提交协议——2SC协议，该协议减少了事务的等待时间，提高了事务的并发度，且能无缝地和现有的并发控制协议集成在一起，保证事务的可串行化和原子性．通过模拟实验研究表明，采用该协议能够减少超过截止期的事务数目。相似文献

18.

STM多事务竞争冲突下竞争管理策略的研究

谢灵均胡大裟蒋玉明《现代计算机》2014,(10):3-6

事务存储系统是一种高层次抽象并行编程模型,目的为方便开发并行程序。事务存储系统中的竞争管理模块用于解决事务之间的冲突。传统的事务竞争管理策略只负责仲裁两个冲突事务之间的冲突．提出将多个事务及事务冲突关联转换成一张无向图,基于全局事务冲突情景,利用图顶点着色技术求解无向图中最大独立集。最大独立集中事务相互不冲突,CM仲裁处理并发执行,实现系统并发最大化。相似文献

19.

Hardware transactional memory architecture with adaptive version management for multi-processor FPGA platforms

《Journal of Systems Architecture》2017

Multiprocessor embedded systems integrates diverse dedicated processing units to handle high performance applications such as in multimedia and network processing. However, lock-based synchronization limits the efficiency of such heterogeneous concurrent systems. Hardware Transactional Memory (HTM) is a promising approach in creating an abstraction layer for multi-threaded programming. However, HTM performance is application-specific and determined by version and conflict management configurations. Most previous HTM implementations for embedded system in literature were built on fixed version management that result in significant performance loss when transaction behaviour changes. In this paper, we propose a HTM targeted for embedded applications which is able to adapt its version management based on application behaviour at runtime. It is prototyped and analysed on Altera Cyclone IV platform. Random requests at different contention levels and different transaction sizes are used to verify the performance of the proposed HTM. Based on our experiments, lazy version management is able to obtain up to 12.82% speed-up compared to eager version management at high contention level. Meanwhile, eager version management obtains up to 37.84% speed-up compared to lazy version management at low contention. The adaptive mechanism is able to switch configuration at runtime based on applications behaviour for maximum performance. 相似文献

20.

Reducing Transaction Processing Latency in Hardware Transactional Memory-based Database with Non-volatile Memory

下载免费PDF全文

Xingda Wei Fangming Lu Rong Chen Haibo Chen Binyu Zang 《International Journal of Software and Informatics》2022,12(1):31-53

The emergency of Hardware Transactional Memory (HTM) has greatly boosted the transaction processing performance in in-memory databases. However, the group commit protocol, aiming at reducing the impact from slow storage devices, leads to high transaction commit latency. Non-Volatile Memory (NVM) opens opportunities for reducing transaction commit latency. However, HTM cannot cooperate with NVM together: flushing data to NVM will always cause HTM to abort. In this paper, we propose a technique called parity version to decouple the process of HTM execution and NVM write. Thus, the transactions can correctly and efficiently use NVM to reduce their commit latency with HTM. We have integrated this technique into DBX, a state-of-the-art HTM-based database, and propose DBXN: a low-latency and high-throughput in-memory transaction processing system. Evaluations using typical OLTP workloads including TPC-C show that it has 99% lower latency and 2.1 times higher throughput than DBX. 相似文献