首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对OpenStack面临的虚拟机容灾备份问题,提出了一种基于Ceph存储快照的虚拟机容灾备份系统,备份时对虚拟机存储在Ceph中的磁盘生成快照,再根据备份要求计算有效数据或者变化数据,保存虚拟机的配置信息以及磁盘数据,恢复时自动创建相同配置的虚拟机并将当前快照点的磁盘数据恢复到该虚拟机中.实验表明,该方法比OpenStack的快照备份方法能有效节省备份时间和存储空间,并且可以实现后者不具有的增量备份、多磁盘备份等功能.  相似文献   

2.
In the study of data storage and retrieval involving secondary storage devices, for example, magnetic disks, a simplified model of storage that is often used is that each access takes a constant amount of time. However, if some information about the accesses is known, the model should take into consideration the inherent characteristics of the storage devices. In this paper, we assume a more refined model of storage that takes into consideration the seek time, the latency time, and the transmission time of disk accesses separately. We analyze the time required to randomly access a set of records residing on a set of consecutive cylinders on a magnetic disk a number of times, say n, for n ≥ 1. This problem may arise, for example, in the processing of queries that involve several relations in a relational database system. We also analyze the more general situations in which the n operations may represent retrievals, insertions, or deletions, or a combination of them. We assume that the dynamic file structure linear hashing is used for locating and organizing the records. A linear hashing file does not employ any directory and its primary data buckets are assumed to be contiguous, therefore the data area of a linear hashing file corresponds closely to the disk space on a set of consecutive cylinders.  相似文献   

3.
人们设计了许多索引以有效地处理高维空间中的近邻查询和区域查询。已经证明,维数较高时利用高维索引处理这两类查询几乎不可能比线性扫描快。提出了一种两层索引以自适应地识别数据集中的聚簇;数据集具有聚簇特性时,用该索引处理邻近查询和区域查询比现有的索引结构快;对其他数据集,利用该索引处理邻近查询和区域查询与线性扫描大致相当。该索引的上层结构将一些参考点组织成一棵二叉树,下层结构是一系列动态哈希表。数据集中的数据点根据它们到参考点的相对距离被哈希到相应的哈希桶中。查询处理时用查询点到参考点的距离进行剪除搜索。实验表明,提出的索引结构具有良好的性能。  相似文献   

4.
一种高可用对象存储系统的数据组织研究   总被引:1,自引:0,他引:1  
詹玲  张强善  万继光 《计算机科学》2009,36(11):123-126
通过对现有不同存储系统的容错能力进行认真分析,提出了一种新的高可用对象存储系统架构HAOSS(High Availability Object Storage System).HAOSS系统分两层:上层存储的对象采用在多个设备之间的数据备份的方式来实现高可靠性,多个备份能够同时对外提供服务,保证系统的高性能,但是磁盘利用率比较低.下层采用RAID5,RAID6以及RAID_Blaurn等不同的容错编码来实现多盘容错功能,磁盘利用率较高,但是由于编码越来越复杂,容错编码需要大量的计算,性能受到很大的影响.在数据组织上,新对象和热点对象放上层,这样大部分请求都能够在上层命中,从而保证系统的性能.下层主要用来放不常用的数据.磁盘利用率比较高.在1000Mbps以太网环境下对HAOSS性能进行了测试,结果表明,HAOSS顺序读写性能都比较好,最大可以达到104MB/s,达到1000Mb以太网的理论最大物理带宽.  相似文献   

5.
Selectivity estimation is an important step of query optimization in a database management system, and multi-dimensional histogram techniques have proved promising for selectivity estimation. Recent multi-dimensional histogram techniques such as GenHist and STHoles use an arbitrary bucket layout. This layout has the advantage of requiring a smaller number of buckets to model tuple densities than those required by the traditional grid or recursive layouts. However, the arbitrary bucket layout brings an inherent disadvantage of requiring more memory to store each bucket location information. This diminishes the advantage of requiring fewer buckets and, therefore, has an adverse effect on the resulting selectivity estimation accuracy. To our knowledge, however, no existing histogram-based technique with arbitrary layout addresses this issue. In this paper, we introduce the idea of bucket location compression and then demonstrate its effectiveness for improving selectivity estimation accuracy by proposing the STHoles+ technique. STHoles+ extends STHoles by quantizing each coordinate of a bucket relative to the coordinate of the smallest enclosing bucket. This quantization increases the number of histogram buckets that can be stored in the histogram. Our quantization scheme allows STHoles+ to trade precision of histogram bucket locations for storing more buckets. Experimental results show that STHoles+ outperforms STHoles on various data distributions, query distributions, and other factors such as available memory size, quantization resolution, and dimensionality of the data space.  相似文献   

6.
用于数据仓储的一种改进的多维存储结构   总被引:7,自引:2,他引:7  
冯建华  蒋旭东  周立柱 《软件学报》2002,13(8):1423-1429
对于数据仓库中数据的物理存储组织,目前主要有关系和多维数组两种方式.这两种方式各有自己的优缺点,从提高联机分析处理(online analytical processing,简称OLAP)查询处理性能的角度出发,多维数组方式相对较优,目的主要是解决数据仓库的多维存储结构问题.针对当前多维数组存储组织方式存在的一些问题,提出了Cube(立方体)逻辑存储和物理存储的概念,首先将原多维数据空间划分为逻辑子空间,逻辑块再划分为多个物理块.在物理存储时充分考虑了多维数组的大容量和高稀疏度的问题,并采用新的多维数组的分布和压缩方法.这些概念和方法有效地解决了维内部层次结构的聚集操作和Cube操作的效率问题,显著提高了涉及维内部层次的聚集查询的响应速度,同时还解决了增量维护的效率问题.  相似文献   

7.
The improvements in disk speeds have not kept up with improvements in processor and memory speeds. Many techniques have been proposed and utilized to maximize the bandwidths of storage devices. These techniques have proven useful for conventional data, but when applied to multimedia data, they tend to be insufficient or inefficient due to the diversified data types, bandwidth requirements, file sizes and structures of complex objects of multimedia data. In this paper, we discuss the design of an efficient multimedia object allocation strategy that strives to achieve the expected retrieval rates and I/O computational requirements of objects; and also effectively balances the loads on the storage devices. We define a multimedia object model, describe the multimedia object and storage device characteristics, the classification of the multimedia objects according to their I/O requirements, and the fragmentation strategies. We use a bipartite graph model for mapping of fragments to storage devices. A cost function based on a disk utilization per allocated space, the amount of free space, and the bandwidth of a storage device is used to determine the optimal allocation for an object's data.  相似文献   

8.
Traditional approaches for similarity-based retrieval of structured data, such as Case-Based Reasoning (CBR), have been largely implemented using centralized storage systems. In such systems, when the cases contain both numeric and free-text attributes, similarity-based retrieval cannot exploit standard speedup techniques based on multi-dimensional indexing, and the retrieval is implemented by an exhaustive comparison of the case to be solved with the whole set of stored cases. In this work, we review current research on Peer-to-Peer (P2P) and distributed CBR techniques and propose a novel approach for storage of the case-base in a decentralized Peer-to-Peer environment using the notion of Unspecified Ontology to improve the performance of the case retrieval stage and build CBR systems that can scale up to large case-bases. We develop an algorithm for efficient retrieval of approximated most-similar cases, which exploits inherent characteristics of the unspecified ontology in order to improve the performance of the case retrieval stage in the CBR problem solving cycle. The experiments show that the algorithm successfully retrieves cases close to the most-similar cases, while reducing the number of cases to be compared. Hence, it improves the performance of the retrieval stage. Moreover, the distributed nature of our approach eliminates the computational bottleneck and single point of failure of the centralized storage systems.  相似文献   

9.
Recent advances in computer technologies have made it feasible to provide multimedia services, such as news distribution and entertainment, via high-bandwidth networks. The storage and retrieval of large multimedia objects (e.g., video) becomes a major design issue of the multimedia information system. While most other works on multimedia storage servers assume an on-line disk storage system, we consider a two-tier storage architecture with a robotic tape library as the vast near-line storage and an on-line disk system as the front-line storage. Magnetic tapes are cheaper, more robust, and have a larger capacity; hence, they are more cost effective for large scale storage systems (e.g., video-on-demand (VOD) systems may store tens of thousands of videos). We study in detail the design issues of the tape subsystem and propose some novel tape-scheduling algorithms which give faster response and require less disk buffer space. We also study the disk-striping policy and the data layout on the tape cartridge in order to fully utilize the throughput of the robotic tape system and to minimize the on-line disk storage space.  相似文献   

10.
Mitra is a scalable storage manager that supports the display of continuous media data types, e.g., audio and video clips. It is a software based system that employs off-the-shelf hardware components. Its present hardware platform is a cluster of multi-disk workstations, connected using an ATM switch. Mitra supports the display of a mix of media types. To reduce the cost of storage, it supports a hierarchical organization of storage devices and stages the frequently accessed objects on the magnetic disks. For the number of displays to scale as a function of additional disks, Mitra employs staggered striping. It implements three strategies to maximize the number of simultaneous displays supported by each disk. First, the EVEREST file system allows different files (corresponding to objects of different media types) to be retrieved at different block size granularities. Second, the FIXB algorithm recognizes the different zones of a disk and guarantees a continuous display while harnessing the average disk transfer rate. Third, Mitra implements the Grouped Sweeping Scheme (GSS) to minimize the impact of disk seeks on the available disk bandwidth.In addition to reporting on implementation details of Mitra, we present performance results that demonstrate the scalability characteristics of the system. We compare the obtained results with theoretical expectations based on the bandwidth of participating disks. Mitra attains between 65% to 100% of the theoretical expectations.  相似文献   

11.
As databases increasingly integrate different types of information such as time-series, multimedia and scientific data, it becomes necessary to support efficient retrieval of multi-dimensional data. Both the dimensionality and the amount of data that needs to be processed are increasing rapidly. As a result of the scale and high dimensional nature, the traditional techniques have proven inadequate. In this paper, we propose search techniques that are effective especially for large high dimensional data sets. We first propose VA+VA+-file technique which is based on scalar quantization of the data. VA+VA+-file is especially useful for searching exact nearest neighbors (NN) in non-uniform high dimensional data sets. We then discuss how to improve the search and make it progressive by allowing some approximations in the query result. We develop a general framework for approximate NN queries, discuss various approaches for progressive processing of similarity queries, and develop a metric for evaluation of such techniques. Finally, a new technique based on clustering is proposed, which merges the benefits of various approaches for progressive similarity searching. Extensive experimental evaluation is performed on several real-life data sets. The evaluation establishes the superiority of the proposed techniques over the existing techniques for high dimensional similarity searching. The techniques proposed in this paper are effective for real-life data sets, which are typically non-uniform, and they are scalable with respect to both dimensionality and size of the data set.  相似文献   

12.
随着对地观测技术的进步,海量地学时空场数据的积累对时空场数据的建模、检索与分析提出新的要求。基于张量结构构建多维时空场数据组织方法,建立了基于时空立方体模型的数据存储结构,并定义了相应的数据操作功能与数据接口,进而设计了时空场数据的分层索引机制及基于张量运算算子的地学时空场数据分析方法。基于卫星测高数据的系统验证结果表明:本模型可有效支撑多维时空场数据的表达、检索与分析,是对高维时空场数据分析与建模的有益探索。  相似文献   

13.
A new record-clustering scheme is introduced, in which the record address is determined by multiple keys. Associated with this storage scheme is a new type of index called multi-dimensional directory. Those keys which determine the record address are jointly indexed by this directory. A data base structure which combines this new technique and the file inversion technique is analyzed. The costs of retrieval, update and storage space for this data base structure are mathematically formulated. An example illustrates that this new data base structure can be superior to the classical combination of indexed sequential and file inversion techniques.  相似文献   

14.
We describe the organization of a general purpose data archival system for Write-Once, Read-Many (WORM) optical disks. The system has been designed for large-scale and long-term data storage and retrieval. The archival system is independent of the operating system, flat, self-consistent, does not use any write cache on magnetic disk, and allows the exploitation of auxiliary information on magnetic disk, which can be rebuilt immediately in case of a crash, to speed up file retrieval. A library in C language, called pODLIB, has been implemented as a portable interface to the archival system.  相似文献   

15.
It is desirable to design partitioning methods that minimize the I/O time incurred during query execution in spatial databases. This paper explores optimal partitioning for two-dimensional data for a class of queries and develops multi-disk allocation techniques that maximize the degree of I/O parallelism obtained in each case. We show that hexagonal partitioning has optimal I/O performance for circular queries among all partitioning methods that use convex non-overlapping regions. An analysis and extension of this result to all possible partitioning techniques is also given. For rectangular queries, we show that hexagonal partitioning has overall better I/O performance for a general class of range queries, except for rectilinear queries, in which case rectangular grid partitioning is superior. By using current algorithms for rectangular grid partitioning, parallel storage and retrieval algorithms for hexagonal partitioning can be constructed. Some of these results carry over to circular partitioning of the data—which is an example of a non-convex region.  相似文献   

16.
现实世界中,多维数据分布常常不是单一一种类型,而是在不同的数据区域中呈现不同类型的数据分布.提出了一种面向多维混合型数据分布的混合多维直方图COCA*-Hist方法.这种方法在给定的空间预算下,根据数据分布空间不同的区域中的数据分布类型,可以包含多种不同类型的直方桶,从总体上提高直方图的准确性.由于需要对创建多维直方图的树结构进行二次遍历,以识别不同类型的数据分布区域并进行空间预算的重分配,COCA*-Hist时间效率略低于MHist算法,但对因此获得的准确性的提高和面对不同数据分布类型的通用性来说,是可以接受的.  相似文献   

17.
当前,面对科学、工程和商业领域中海量的多维数据,用户迫切需要使用有效的可视化工具在知识发现、信息认知及信息决策过程中对其进行理解。针对传统基于降维映射的数据可视化方法计算复杂度高且无法提供维度分布信息的缺点,提出一种基于正2k边形的多维数据可视化方法RPES,通过建立多维数据空间的低维"参照物"——正2k边形坐标系,以减小多维对象在正2k边形坐标系及多维数据空间中的坐标差别为准则,使用最优化方法对其进行降维,以点云的形式标绘在低维可视空间中,完成多维数据的降维可视展现。实验证明,RPES的降维算法高效、容易实现,适用于数据量较大、维度较高的数据集,可视化结果不仅易于理解,而且能够有效提供维度分布信息,有利于用户发掘隐性知识,辅助其进行基于多维数据的决策。  相似文献   

18.
Efficient storage techniques for digital continuous multimedia   总被引:4,自引:0,他引:4  
The problem of collocational storage of media strands, which are sequences of continuously recorded audio samples or video frames, on disk to support the integration of storage and transmission of multimedia data with computing is examined. A model that relates disk and device characteristics to the playback rates of media strands and derives storage patterns so as to guarantee continuous retrieval of media strands is presented. To efficiently utilize the disk space, mechanisms for merging storage patterns of multiple media strands by filling the gaps between media blocks of one strand with media blocks of other strands are developed. Both an online algorithm suitable for merging a new media strand into a set of already stored strands and an offline merging algorithm that can be applied a priori to the storage of a set of media strands before any of them have been stored on disk are proposed. As a consequence of merging, storage patterns of media strands may become perturbed slightly. To compensate for this read-ahead and buffering are required so that continuity of retrieval remains satisfied are also presented  相似文献   

19.
We present a new method for preprocessing and organizing discrete scalar volume data of any dimension on external storage. We describe our implementation of a visual navigation system using our method. The techniques have important applications for out-of-core visualization of volume data sets and image understanding. The applications include extracting isosurfaces in a manner that helps reduce both I/O and disk seek time, a priori topologically correct isosurface simplification (prior to extraction), and producing a visual atlas of all topologically distinct objects in the data set. The preprocessing algorithm computes regions of space that we call topological zone components, so that any isosurface component (contour) is completely contained in a zone component and all contours contained in a zone component are topologically equivalent. The algorithm also constructs a criticality tree that is related to the recently studied contour tree. However, unlike the contour tree, the zones and the criticality tree hierarchically organize the data set. We demonstrate that the techniques work on both irregularly and regularly gridded data, and can be extended to data sets with nonunique values, by the mathematical analysis we call Digital Morse Theory (DMT), so that perturbation of the data set is not required. We present the results of our initial experiments with three dimensional volume data (CT) and describe future extensions of our DMT organizing technology.  相似文献   

20.
针对网络视频监控系统中传统文件系统存储方案存储效率低和检索性能差的缺陷,提出一种基于裸磁盘设备的录像存储方案。根据监控系统的数据存储特点,采用B+树管理录像段索引信息,设计一种磁盘逻辑存储结构,并给出一种基于图像组的数据缓存机制。系统测试结果表明,与传统的文件系统存储方案相比,该方案在监控系统的512Kb/s和1Mb/s典型存储码率下,录像存储效率分别提高了43.6%和30.3%,录像检索耗时降至35ms以下。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号