首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 260 毫秒
1.
现有的金融行业的数据管理模式主要依赖于传统关系型数据库,然而传统架构受到拓展能力和存储性能的限制,难以满足大数据时代快速增长的海量数据量处理的需要。针对金融数据规模大、跨地域、跨系统存储、数据多样化等特点,提出了HiETL大数据迁移管理平台,实现了异构关系型数据库业务系统向Hadoop大数据平台的统一迁移,以及海量数据的集中整合、拓展存储、高效分析查询等一站式管理平台,在保证迁移准确的情况下,其速度可达到3?MB/s。  相似文献   

2.
郭栋王伟  曾国荪 《计算机应用》2013,33(12):3432-3436
随着云计算和大数据技术的发展,传统的单一存储介质的数据存储方式已经不能满足大数据处理的需求,在这样的背景下,分布式数据存储得到了广泛的应用。然而,目前存在的几种分布式存储方式并不能够完美地满足分布系统的需求。为了更有效地实现数据的分布式存储和冗余备份,采用一种新的基于一致性树分布(CTD)的分布式存储方法,并提出基于该方法的备份策略,实现数据索引与存储位置的映射。该方案具有负载平衡、无单点故障问题、扩展性高且易于实现的优点。同时提出了基于一致性二叉树分布(CBTD)的应用方案。通过对应用系统实例的分析,验证该方法能够很好地满足分布式系统的数据平衡、负载均衡和扩展性需求。  相似文献   

3.
随着水利信息化的快速发展,水利大数据应用日益广泛,通常的存储技术已经无法满足需要,存储容量及数据读写速率已成为大数据处理发展过程中的瓶颈。与全国山洪灾害防治管理平台中的分级存储数据集群建设实践相结合,根据信息生命周期理论,应用分级存储技术与方法,再结合水利数据特性制定迁移策略,搭建分级存储集群,有效地满足对海量数据存储和处理速率的需求,打破存储对大数据处理的制约。最终,该分级存储集群能够在节省存储成本的同时有效增加存储容量并提高对数据的读写速率,从而使集群的综合性能趋于平衡。  相似文献   

4.
一种异构集群中能量高效的大数据处理算法   总被引:2,自引:0,他引:2  
集群的能量消耗已经超过了其本身的硬件购置费用,而大数据处理需要大规模的集群耗费大量时间,因此如何进行能量高效的大数据处理是数据拥有者和使用者亟待解决的问题,也是对能源和环境的一个巨大挑战.现有的研究一般通过关闭部分节点以减少能量消耗,或者设计新的数据存储策略以便实施能量高效的数据处理.通过分析发现即便使用最少的节点也存在很大的能源浪费,而新的数据存储策略对于已经部署好的集群会造成大规模的数据迁移,消耗额外的能量.针对异构集群下I/O密集型的大数据处理任务,提出一种新的能量高效算法MinBalance,将问题分为节点选择和负载均衡两个步骤.在节点选择阶段采用4种不同的贪心策略,充分考虑到节点的异构性,尽量选择最合适的节点进行任务处理;在负载均衡阶段对选择的节点进行负载均衡,以减少各个节点因为等待而造成的能量浪费.该方法具有通用性,不受数据存储策略的影响.实验表明MinBalance方法在数据集较大的情况下相对于传统关闭部分节点的方法可以减少超过60%的能量消耗.  相似文献   

5.
一种海量数据分级存储系统TH-TS   总被引:5,自引:0,他引:5  
随着数据存储规模的飞速增长,降低存储系统的总拥有成本,提高数据访问性能成为构建海量存储系统的关键.设计并实现了一个海量数据分级存储系统TH-TS(Tsinghua Tiered Storage),由多级存储设备构成一体化的数据存储环境.该系统提出了Cute Mig数据迁移方法:采用基于升级成本和升级收益的升级迁移策略和基于剩余空间的文件自适应降级选择策略,解决了传统on-demand迁移方法中迁移数据量大、访问性能不佳的问题.评测结果表明,TH-TS采用Cute Mig迁移方法的系统平均I/O响应时间比传统的LRU和GreedyDualSize方法分别降低了10%和39%左右,数据升级迁移量分别降低了32%和59%左右,降级迁移量分别降低了47%和66%左右.  相似文献   

6.
黄丹群 《工矿自动化》2014,(11):103-105
针对目前煤矿生产中由于业务子系统和测点数据过于庞大导致数据存储与处理效率低下的问题,设计了一种基于KingHistorian的综合自动化平台。该平台通过KingSCADA+KingHistorian的主体架构进行数据处理与存储,实现在异构条件下对生产安全及工况信息进行集成与共享,建立不同系统之间的数据关联。实际应用结果表明,该平台可提高数据处理速度,降低数据存储空间,为企业决策提供了坚实的数据支撑。  相似文献   

7.
当前高能物理实验产生的数据量越来越大,利用大数据处理平台Hadoop进行高能物理数据处理时,面临数据迁移的实际需求,而现有迁移工具不支持HDFS与其他文件系统间的数据传输,性能存在明显缺陷。从高能物理数据同步、归档等需求出发,设计和实现了一个通用的海量数据迁移系统,通过扩展HDFS数据访问方式,使用Map-Reduce直接在HDFS数据节点和其他存储系统/介质之间迁移数据。此外,系统设计实现了动态优先级调度模型,进行多任务的动态优先级评定和选取。该系统已经应用于大型高海拔空气簇射观测站(LHAASO)宇宙线等物理实验中的数据迁移,实际运行结果表明系统性能良好,能够满足各个实验的数据迁移需求。  相似文献   

8.
随着计算机科学的发展和大数据时代的到来,应用系统已经出现了数据海量化、用户访问高量化的局面,使得企业应用系统的原有关系型数据库(RDBMS)面临承担更大负荷的压力,系统的高性能要求得不到有效满足,对于关系型数据库所面临的问题,Hadoop平台中的HBase数据库可有效解决。以关系型数据库中MySQL数据库及Hadoop平台中分布式数据库HBase数据库为研究基础,应对企业应用数据海量化增长,提出从关系型数据库(MySQL数据库)向分布式数据库(HBase数据库)进行数据迁移的方法,并通过研究HBase数据库存储原理提出从MySQL到HBase的表模式转换原则实现高效数据查询性能的数据迁移方法。最后,将该方法与同类数据迁移工具Sqoop进行比较,证明该方法进行数据迁移的便捷性和在迁移后数据库中进行连接查询的高效性。  相似文献   

9.
为了使分布式分级混合存储系统高效、快速响应的工作,达到优化系统性能和减少系统资源消耗的目标,通过研究负载的模式,同时考虑数据访问局部性和系统响应时间,提出了相应于负载识别、以及基于频率策略和带宽策略的分级存储迁移算法,提出了一种目标函数为(带宽节省率/命中率)的新评价标准。频率策略是根据访问的周期频率特性来进行数据分级存储迁移,带宽策略是根据在访问中考虑迁移带宽消耗特性来进行数据分级存储迁移。结合实例,模拟仿真实验结果表明,两种策略都能有效地到达目标。频率策略带来的访问次数及命中率较高,而带宽策略可以减少分级存储并发瓶颈数量。  相似文献   

10.
分析了数据迁移时间设计中存在的问题.引入了数学公式来判断存储使用量的变化趋势,由变化趋势、存储系统的传输速率和存储容量来判断迁移时间,这为迁移时间的设计提供了理论方式.同时开发了数据迁移监控器,结合迁移时间制定了相应的迁移策略.在高性能计算的环境下,搭建了两层存储的测试系统,部署了该迁移策略.实验结果表明,迁移时间的设计能实现分层存储中数据及时迁移,并可有效缓解迁移堵塞、提高存储层的使用率.  相似文献   

11.
We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure.SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analysed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform.The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve.  相似文献   

12.
海量结构化数据存储检索系统   总被引:4,自引:0,他引:4  
Big Data是近年在云计算领域中出现的一种新型数据,传统关系型数据库系统在数据存储规模、检索效率等方面不再适用.目前的分布式No-SQL数据库可以提供分布式数据存储环境,但是无法支持多列查询.设计并实现分布式海量结构化数据存储检索系统(MDSS).系统采用列存储结构,采用集中分布式B+Tree索引和局部索引相结合的方法提高检索效率.在此基础上讨论复杂查询条件的任务分解机制,支持大数据的多属性检索、模糊检索以及统计分析等查询功能.实验结果表明,提出的分布式结构化数据管理技术和查询任务分解机制可以显著提高分布式条件下大数据集的查询效率,适合应用在日志类数据、流记录数据等海量结构化数据的存储应用场合.  相似文献   

13.
Beyond the hype of Big Data, something within business intelligence projects is indeed changing. This is mainly because Big Data is not only about data, but also about a complete conceptual and technological stack including raw and processed data, storage, ways of managing data, processing and analytics. A challenge that becomes even trickier is the management of the quality of the data in Big Data environments. More than ever before the need for assessing the Quality-in-Use gains importance since the real contribution–business value–of data can be only estimated in its context of use. Although there exists different Data Quality models for assessing the quality of regular data, none of them has been adapted to Big Data. To fill this gap, we propose the “3As Data Quality-in-Use model”, which is composed of three Data Quality characteristics for assessing the levels of Data Quality-in-Use in Big Data projects: Contextual Adequacy, Operational Adequacy and Temporal Adequacy. The model can be integrated into any sort of Big Data project, as it is independent of any pre-conditions or technologies. The paper shows the way to use the model with a working example. The model accomplishes every challenge related to Data Quality program aimed for Big Data. The main conclusion is that the model can be used as an appropriate way to obtain the Quality-in-Use levels of the input data of the Big Data analysis, and those levels can be understood as indicators of trustworthiness and soundness of the results of the Big Data analysis.  相似文献   

14.
Query optimization in Big Data becomes a promising research direction due to the popularity of massive data analytical systems such as Hadoop system. The query optimization is getting hard to efficiently execute JOIN queries on top of Hadoop query language, Hive, over limited Big Data storages. According to our previous work, HiveQL Optimization for JOIN query over Multi-session Environment (HOME) system has been introduced over Hadoop system to improve its performance by storing the intermediate results to avoid repeated computations. Time overheads and Big Data storages limitation are considered the main drawback of the HOME system, especially in the case of using additional physical storages or renting extra virtualized storages. In this paper, an index-based system for reusing data called indexing HiveQL Optimization for JOIN over Multi-session Big Data Environment (iHOME) is proposed to overcome HOME overheads by storing only the indexes of the joined rows instead of storing the full intermediate results directly. Moreover, the proposed iHOME system addresses eight cases of JOIN queries which classified into three groups; Similar-to-iHOME, Compute-on-iHOME, and Filter-of-iHOME. According to the experimental results of the iHOME system using TPC-H benchmark, it is found that the execution time of eight JOIN queries using iHOME on Hive has been reduced. Also, the stored data size in the iHOME system is reduced relative to the HOME system, as well as, the Big Data storage is saved. So, by increasing stored data size, the iHOME system guarantees the space scalability and overcomes the storage limitation.  相似文献   

15.
With the increasing adoption of Big Data technologies as basic tools for the ongoing Digital Transformation, there is a high demand for data-intensive applications. In order to efficiently execute such applications, it is vital that cloud providers change the way hardware infrastructure resources are managed to improve their performance. However, the increasing use of virtualization technologies to achieve an efficient usage of infrastructure resources continuously widens the gap between applications and the underlying hardware, thus decreasing resource efficiency for the end user. Moreover, this scenario is especially troublesome for Big Data applications, as storage resources are one of the most heavily virtualized, thus imposing a significant overhead for large-scale data processing. This paper proposes a novel PaaS architecture specifically oriented for Big Data where the scheduler offers disks as resources alongside the more common CPU and memory resources, looking forward to provide a better storage solution for the user. Furthermore, virtualization overheads are reduced to the bare minimum by replacing heavy hypervisor-based technologies with operating-system-level virtualization based on light software containers. This architecture has been deployed on a Big Data infrastructure at the CESGA supercomputing center, used as a testbed to compare its performance with OpenStack, a popular private cloud platform. Results have shown significant performance improvements, reducing the execution time of representative Big Data workloads by up to 4.5×.  相似文献   

16.
针对公共安全领域能够获取的人脸图像数据急速增长,传统的人工方式辨别人物身份工作量大、实时性差、准确度低,本文设计了一种大容量实时人脸检索系统.该系统通过Storm分布式平台实现人脸抓拍图像的实时存储与检索,通过HBase分布式存储系统实现大容量非结构化人脸数据的存储与维护.多组实验结果表明,该系统具有良好的加速比,在大容量人脸图像数据检索场景下具有良好的可扩展性和实时性.  相似文献   

17.
海量信息分级存储数据迁移策略研究   总被引:3,自引:0,他引:3  
以数据为中心的计算模式对存储系统的性能和可靠性提出了新的更高的要求。目前,PB量级的存储系统需要数千甚至上万块磁盘,高并行性、高可靠性和高性价比是海量磁盘存储系统的三项关键要求。本文提出由性能和可靠性不同的两级磁盘阵列组成二级海量存储系统,通过数据自动迁移,可在保证存储系统高性价比的条件下,获得更高的并行访问速率和可靠性。本文基于分级存储管理的思想,提出了FC—SAS和SATAII两级存储模型,设计了FV数据价值评定模型和迁移过程控制策略,实现对数据价值的精确判定,在尽量减小对系统访问性能影响的基础上,实现数据在两级设备间的高效迁移和用户的透明访问。  相似文献   

18.
大数据时代的到来,更强的计算机和更成熟的大数据平台工具让企业从海量数据中挖掘数据价值成为了可能,尤其是基于Hadoop的大数据平台,甚至利用廉价的商业硬件处理TB、PB级别的数据. 在最初Hadoop大数据平台落地建设的过程中,往往功能先行,而忽略了安全的管控策略,直到2009年Yahoo团队提出了基于Kerberos的身份验证方案,才带动了Hadoop大数据平台安全管控工作的全面开展. 本文介绍了Hadoop大数据平台的基本历程,描述了2009年之前Hadoop大数据平台存在的传统安全问题,并尝试着将目前行业内Hadoop生态系统组件的安全性和每个组件的安全解决方案做一次系统的梳理,希望为构建Hadoop大数据平台管控方案时提供参考意见,以便合理利用先进的安全管控方案保护好企业、用户的隐私数据.  相似文献   

19.
Forensic examiners are in an uninterrupted battle with criminals in the use of Big Data technology. The underlying storage system is the main scene to trace the criminal activities. Big Data Storage System is identified as an emerging challenge to digital forensics. Thus, it requires the development of a sound methodology to investigate Big Data Storage System. Since the use of Hadoop as Big Data Storage System continues to grow rapidly, investigation process model for forensic analysis on Hadoop Storage and attached client devices is compulsory. Moreover, forensic analysis on Hadoop Big Data Storage System may take additional time without knowing where the data remnants can reside. In this paper, a new forensic investigation process model for Hadoop Big Data Storage System is proposed and discovered data remnants are presented. By conducting forensic research on Hadoop Big Data Storage System, the resulting data remnants assist the forensics examiners and practitioners for generating the evidences.  相似文献   

20.
本文在对决策支持系统发展现状及瓶颈进行总结的基础上,分析"大数据"带来的变革及对决策支持系统的积极影响,从系统定位、决策模式、数据处理、信息检索、系统安全等五个方面对"大数据"时代决策支持系统发展趋势进行了展望。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号