首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In a video-on-demand (VOD) environment, disk arrays are often used to support the disk bandwidth requirement. This can pose serious problems on available disk bandwidth upon disk failure. In this paper, we explore the approach of replicating frequently accessed movies to provide high data bandwidth and fault tolerance required in a disk-array-based video server. An isochronous continuous video stream imposes different requirements from a random access pattern on databases or files. Explicitly, we propose a new replica placement method, called rotational mirrored declustering (RMD), to support high data availability for disk arrays in a VOD environment. In essence, RMD is similar to the conventional mirrored declustering in that replicas are stored in different disk arrays. However, it is different from the latter in that the replica placements in different disk arrays under RMD are properly rotated. Combining the merits of prior chained and mirrored declustering methods, RMD is particularly suitable for storing multiple movie copies to support VOD applications. To assess the performance of RMD, we conduct a series of experiments by emulating the storage and delivery of movies in a VOD system. Our results show that RMD consistently outperforms the conventional methods in terms of load-balancing and fault-tolerance capability after disk failure, and is deemed a viable approach to supporting replica placement in a disk-array-based video server.  相似文献   

2.
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets. Recommended by: Ahmed Elmagarmid Supported by U.S. Department of Energy (DOE) Award No. DE-FG02-03ER25573, and National Science Foundation (NSF) grant CNS-0403342.  相似文献   

3.
海量存储网络中的虚拟盘副本容错技术   总被引:2,自引:1,他引:2  
大规模存储网络中的数据可用性和读写性能越来越重要.在海量存储虚拟化系统的基础上,实现了多副本虚拟盘技术来提高网络存储的数据容错能力.同时,通过多副本选择调度与异步副本更新以及副本盘空间布局的动态调整算法,提高了系统的数据读写能力.测试结果表明,加入虚拟盘副本后,在设备数量充足情况下的读性能可提高26%;即使少量磁盘失效,读写操作也能正确执行,且读性能仍然比无副本时提高10%以上.  相似文献   

4.
External mergesort is normally implemented so that each run is stored continuously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from different runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buffer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The effects of using multiple work disks were also investigated. We found that, in most cases, interleaved layout does not improve performance, but that the new reading strategy consistently performs better than double buffering and forecasting  相似文献   

5.
秦啸  庞丽萍  韩宗芬  李胜利 《软件学报》1999,10(9):996-1002
文章给出一个实时非固定双头镜像磁盘系统的形式化模型.该磁盘模型中的每个双头磁盘都有两个相互独立的磁臂,能够独立地完成寻找磁道过程.针对该磁盘系统,文章研究了3种实时调度算法.模拟实验表明,“忽略超截止期调度算法”的性能最好,因为它忽略了对超截止期限实时请求的处理.文章同时分析了固定双头镜像磁盘与非固定双头镜像磁盘之间的性能差别.实验结果表明,由于非固定双头磁盘的两个磁头可以独立寻找磁道,因此非固定双头镜像磁盘的性能比固定双头镜像磁盘的性能要好.  相似文献   

6.
The performance of traditional RAID Level 5 arrays is, for many applications, unacceptably poor while one of its constituent disks is non-functional. This paper describes and evaluates mechanisms by which this disk array failure-recovery performance can be improved. The two key issues addressed are thedata layout, the mapping by which data and parity blocks are assigned to physical disk blocks in an array, and thereconstruction algorithm, which is the technique used to recover data that is lost when a component disk fails.The data layout techniques this paper investigates are instantiations of thedeclustered parity organization, a derivative of RAID Level 5 that allows a system to trade some of its data capacity for improved failure-recovery performance. We show that our instantiations of parity declustering improve the failure-mode performance of an array significantly, and that a parity-declustered architecture is preferable to an equivalent-size multiple-group RAID Level 5 organization in environments where failure-recovery performance is important. The presented analyses also include comparisons to a RAID Level 1 (mirrored disks) approach.With respect to reconstruction algorithms, this paper describes and briefly evaluates two alternatives,stripeoriented reconstruction anddisk-oriented reconstruction, and establishes that the latter is preferable as it provides faster reconstruction. The paper then revisits a set of previously-proposed reconstruction optimizations, evaluating their efficacy when used in conjunction with the disk-oriented algorithm. The paper concludes with a section on the reliability versus capacity trade-off that must be addressed when designing large arrays.Portions of this material are drawn from papers at the 5th Conference on Architectural Support for Programming Languages and Operating Systems, 1992, and at the 23rd Symposium on Fault-Tolerant Computing, 1993. The work was supported by the National Science Foundation under grant number ECD-8907068, by the Defense Advanced Research Project Agency monitored by ARPA/CMO under contract MDA972-90-C-0035, and by an IBM Graduate Fellowship.  相似文献   

7.
This paper studies workfile disk management for concurrent mergesorts ina multiprocessor database system. Specifically, we examine the impacts of workfile disk allocation and data striping on the average mergesort response time. Concurrent mergesorts in a multiprocessor system can creat severe I/O interference in which a large number of sequential write requests are continuously issued to the same workfile disk and block other read requests for a long period of time. We examine through detailed simulations a logical partitioning approach to workfile disk management and evaluate the effectiveness of datastriping. The results show that (1) without data striping, the best performance is achieved by using the entire workfile disks as a single partition if there are abundant workfile disks (or system workload is light); (2) however, if there are limited workfile disks (or system workload is heavy), the workfile disks should be partitioned into multiple groups and the optimal partition size is workload dependent; (3) data striping is beneficial only if the striping unit size is properly chosen; and (4) with a proper striping size, the best performance is generally achieved by using the entire disks as a single logical partition.  相似文献   

8.
Efficient storage and retrieval of multi-attribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multi-attribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. The authors derive formulas describing the scalability of two popular declustering methods-Disk Module and Fieldwise Xor-for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in the paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions  相似文献   

9.
Declustering is a common technique used to reduce query response times. Data is declustered over multiple disks and query retrieval can be parallelized. Most of the research on declustering is targeted at spatial range queries and investigates schemes with low additive error. Recently, declustering using replication has been proposed to reduce the additive overhead. Replication significantly reduces retrieval cost of arbitrary queries. In this paper, we propose a disk allocation and retrieval mechanism for arbitrary queries based on design theory. Using the proposed c-copy replicated declustering scheme, buckets can be retrieved using at most k disk accesses. Retrieval algorithm is very efficient and is asymptotically optimal with complexity for a query Q. In addition to the deterministic worst-case bound and efficient retrieval, proposed algorithm handles nonuniform data, high dimensions, supports incremental declustering and has good fault-tolerance property. Experimental results show the feasibility of the algorithm. Recommended by: Sunil Prabhakar  相似文献   

10.
一种并行数据库的动态多维数据分布方法   总被引:7,自引:0,他引:7  
李建中 《软件学报》1999,10(9):909-916
并行数据库系统的性能与数据库在多处理机之间的分布密切相关.目前已经出现一些并行数据库的数据分布方法.但是,这些方法都不能有效地支持动态数据库.文章提出了一种并行数据库的动态多维数据分布方法.该方法不仅能够有效地支持动态数据库的分布,还具有多维数据分布的诸多优点.此方法由初始数据分布机构和启发式动态数据分布调整机构组成.初始分布机构完成给定数据库文件的初始分布.动态数据分布调整机构实现动态数据库数据分布的动态调整.理论分析和实验结果表明,这种方法十分有效,并且能够有力地支持动态数据库上的各种并行数据操作算法.  相似文献   

11.
并行数据库上的并行CMD-Join算法   总被引:3,自引:1,他引:3  
李建中  都薇 《软件学报》1998,9(4):256-262
并行数据库在多处理机之间的分布方法(简称数据分布方法)对并行数据操作算法的性能影响很大.如果在设计并行数据操作算法时充分利用数据分布方法的特点,可以得到十分有效的并行算法.本文研究如何充分利用数据分布方法的特点,设计并行数据操作算法的问题,提出了基于CMD多维数据分布方法的并行CMD-Join算法.理论分析和实验结果表明,并行CMD-Join算法的效率高于其它并行Join算法.  相似文献   

12.
Supporting continuous media data-such as video and audio-imposes stringent demands on the retrieval performance of a multimedia server. In this paper, we propose and evaluate a set of data placement and retrieval algorithms to exploit the full capacity of the disks in a multimedia server. The data placement algorithm declusters every object over all of the disks in the server-using a time-based declustering unit-with the aim of balancing the disk load. As for runtime retrieval, the quintessence of the algorithm is to give each disk advance notification of the blocks that have to be fetched in the impending time periods, so that the disk can optimize its service schedule accordingly. Moreover, in processing a block request for a replicated object, the server will dynamically channel the retrieval operation to the most lightly loaded disk that holds a copy of the required block. We have implemented a multimedia server based on these algorithms. Performance tests reveal that the server achieves very high disk efficiency. Specifically, each disk is able to support up to 25 MPEG-1 streams. Moreover, experiments suggest that the aggregate retrieval capacity of the server scales almost linearly with the number of disks  相似文献   

13.
目前针对并行空间数据处理的研究主要集中在空间数据划分及其在其基础上的并行空间算法,对空间并行数据库平台本身的可用性,如应用程序的开发模式、高并发请求支持等研究较少。为此,对开源并行关系数据库查询语言进行空间查询扩展,提出一种基于代理的并行空间查询语言,并实现相应的并行数据库平台原型。基于该平台开发标准的网络地图绘图服务,在高并发环境下使用该服务对海量矢量数据进行实时渲染。实验结果表明,该平台具有与传统关系数据库一致的开发应用模式,可提供无缝的衔接方式,在海量数据高并发的情况下具有较高的可用性及查询性能。  相似文献   

14.
Disk arrays and shared-memory multiprocessors are new technologies that are rapidly becoming pervasive. They are complementary because disk arrays naturally balance the I/O workload by interleaving data across all disks while a shared-memory multiprocessor balances the processing workload across multiple processors. In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metricdata temperature (IO/s/Gbyte) as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.  相似文献   

15.
当前针对磁盘功率管理的大部分研究都是把重点放在磁盘空闲周期的利用上;人们相继研究了硬件功率节约机制(比如降速磁盘和多速磁盘)和补充性的软件策略(比如改变代码和数据布局,以提高空闲周期的长度);然而,硬件功率节约机制无法处理高能耗并行应用的短空闲周期,而代码/数据重组策略往往要求大规模的代码更改;提出一种面向编译器的数据访问(I/O调用)调度技术,以节约磁盘能量,在更短的周期内聚集了尽可能多的数据请求,进而延长了磁盘空闲周期,提升硬件功率管理机制的有效性;与先前基于软件的策略相比,该技术不需重组代码或数据;在基于集群的仿真环境下结合6种应用程序对该方法进行评估;结果表明,该方法提升了降速磁盘和多速磁盘的性能,将功率节约平均效果提升了一倍。  相似文献   

16.
Various strategies have been developed to assist in determining an effective data placement for a parallel database system. However, little work has been done on assessing the relative performance obtained from different strategies. This paper studies the effects of different data placement strategies on a shared-nothing parallel relational database system when the number of disks attached to each node is varied. Three representative strategies have been used for the study and the performance of the resulting configuration has been assessed in the context of the TPC–C benchmark running on a specific parallel system. Results show an increase in sensitivity to data placement strategy with increasing number of disks per node.  相似文献   

17.
Data redundancy has been widely used to increase data availability in critical applications and several methods have been proposed to organize redundant data across a disk array. Data redundancy consists of either total data replication or the spreading of the data across the disk array along with parity information which can be used to recover missing data in the event of disk failure. In this paper we present an extended comparative analysis, carried out by using discrete event simulation models, between two disk array architectures: the Redundant Arrays of Inexpensive Disks (RAID) level 1 architecture, based on data replication; and the RAID level 5 architecture, based on the use of parity information. The comparison takes both performance and cost aspects into account. We study the performance of these architectures simulating two application environments characterized by different sizes of the data accessed by I/O operations. In addition, several scheduling policies for I/O requests are considered and the impact of non-uniform access to data on performance is investigated.  相似文献   

18.
Parallelizing I/O operations via effective declustering of data is becoming essential to scale up the performance of parallel databases or high performance systems. Declustering has been shown to be a NP-complete problem in some contexts. Some heuristic methods have been proposed to solve this problem. However, most methods are not effective in several cases such as queries with different access frequencies or data with different sizes. In this paper, we propose a hypergraph model to formulate the declustering problem. Several interesting theoretical results are achieved by analyzing the proposed model. The proposed approach will allow modeling a wide range of declustering problems. Furthermore, the hypergraph declustering model is used as the basis to develop new heuristic methods, including a greedy method and a hybrid declustering method. Experiments show that the proposed methods can achieve better performance than several declustering methods.  相似文献   

19.
针对传统的关系数据存储系统性能不足、容错性差,无法适应海量非结构化数据管理的问题,提出一种高性能、高可用非关系型存储管理机制。首先,设计了良好的用户访问服务接口,通过高效的一致性哈希算法支持数据分发到多个存储节点;其次,采用可配置的数据副本机制改善存储系统的可用性;最后,提出查询故障处理机制,用以提升存储系统的容错性,避免节点失效导致服务中断问题。实验结果表明,在不同规模用户负载下,新的存储系统的并发访问请求能力和传统的文件系统、关系数据库相比,分别提升了30%和50%;同时,在合理响应时间内,故障状态下的存储系统的可用性损失小于14%。因此,该机制适用于海量非结构化数据的高效存储管理。  相似文献   

20.
《Information Systems》2005,30(1):47-70
Data declustering is an important issue for reducing query response times in multi-disk database systems. In this paper, we propose a declustering method that utilizes the available information on query distribution, data distribution, data-item sizes, and disk capacity constraints. The proposed method exploits the natural correspondence between a data set with a given query distribution and a hypergraph. We define an objective function that exactly represents the aggregate parallel query-response time for the declustering problem and adapt the iterative-improvement-based heuristics successfully used in hypergraph partitioning to this objective function. We propose a two-phase algorithm that first obtains an initial K-way declustering by recursively bipartitioning the data set, then applies multi-way refinement on this declustering. We provide effective gain models and efficient implementation schemes for both phases. The experimental results on a wide range of realistic data sets show that the proposed method provides a significant performance improvement compared with the state-of-the-art declustering strategy based on similarity-graph partitioning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号