首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
随着越来越多的机构采用集群计算技术来实现高性能计算(HPC)--地球、海洋和大气科学、地震数据分析等科学研究和药物研究、汽车设计模型、商业业务冒险分析等商业应用,集群计算技术已经进化为开发高性能计算系统的主要的方法.所有的这些应用的计算都是公认的复杂.能够有效地管理与这些应用密切相关的数据集驱动了现在的集群计算技术的发展.采用智能存储设备(OSD)大大简化了元数据服务器的工作量,并且使得系统的管理和效率都得到了很大的提高.集中描述一个新的存储体系结构--基于智能OSD的共享存储集群计算系统.  相似文献   

2.
近年来,随着各个领域中大规模、海量数据存储和处理需求的不断增加,集群作为一种廉价的可以提供强大计算能力的并行计算技术得到越来越广泛的应用,其具有大型机的超级计算能力和较低成本投入.从而成为各种高性能计算的主流方向,如科学计算与其他需要大规模并行计算的应用服务等.本文在分析现有分布式储存和计算等关键技术基础上,结合对Hadoop的集群技术的研究以及自身的业务需求和实际软硬件实力,提出了一种基于Hadoop的海量数据处理模型.  相似文献   

3.
高性能计算集群主要用于解决大规模科学计算问题,以及存储和处理海量数据。在行业应用较广。Linux的出现,推动了集群系统的发展。目前,OpenMOSIX集群受到企业用户的欢迎。  相似文献   

4.
介绍高性能计算集群管理现状, 并根据目前管理缺点与不足, 对集群管理系统进行了总体框架设计, 利用Linux中/proc文件系统、MySQL数据库及Web技术, 对管理数据采集、数据存储及应用接口进行了功能设计, 最终实现了在统一Web界面中对集群系统运行实时情况、历史数据统计与分析进行管理.  相似文献   

5.
目前,高性能计算集群的管理和性能分析判断,需要管理员不断查看数据和事件才能做出,这势必造成了巨大的管理开销。为了提升高性能计算集群的自动化管理操作性能,本文分析了集群管理系统的优点和缺点,研究了高性能计算集群系统的建立和管理,分析了运行并行程序性能,探讨了如何达到降低集群管理复杂性的目标。  相似文献   

6.
当集群中的部分节点是廉价主机时,采用HDFS的随机存储策略可能使访问频率高的数据存储在廉价节点上,受到廉价节点的性能影响,访问时间过长,降低了集群效率。为改善以上问题,提出一种改进的副本分级存储调度策略。为减少副本调度的次数,先根据节点的CPU、内存、网络、存储负载以及网络距离来评价节点的性能,再从中选取高性能节点进行存储。副本调度以节点中副本的访问频率为依据,结合硬件配置,把访问频率高的副本尽可能存储在高性能、高配置的节点中,以加快集群响应速度。实验结果表明,改进后的策略可以在异构集群中提高副本的访问效率,优化负载均衡。  相似文献   

7.
高性能气象数据存储集群及在线扩展技术应用   总被引:1,自引:0,他引:1  
为了满足气象数据快速增长和高效应用的业务需求,设计采用基于SAN和GPFS的高性能存储集群进行数据存储,并随着数据量的快速增长,实现灵活在线扩展以满足数据存储需求.结合国家级气象资料存储检索系统、省级风能资源数据库共享服务系统和风能资源数值模拟系统的存储设计,阐述基于SAN和GPFS技术的存储集群架构,并重点介绍存储集群的在线扩展技术的实现.基于SAN和GPFS的存储集群在线扩展技术包含在线扩展服务器节点、在线扩展存储容量和在线扩展文件系统容量三方面,实现了气象数据存储系统随着业务数据变化的灵活扩展,使系统具备优秀的扩展性和适应性.  相似文献   

8.
海洋数据分布区域广泛且数据量庞大,现有数字海洋系统相对独立,海洋数据信息共享度低,制约了大规模海洋数据的高性能计算。为"数字海洋"系统提供有力的数据支持和计算服务。本文提出一种基于云的分布式数字海洋系统,把不同的海洋数据通过高速网络连接起来建立一个存储云,使用一个专用的网络服务协议,用于高性能广域网络连接的计算机集群所进行的大型分布式数据集的高性能计算。实验结果表明,与现有基于Hadoop分布式数字海洋系统相比,该系统的性能有显著提高、数据高度共享。  相似文献   

9.
应用系统的复杂化与微服务化促进了容器的广泛使用, 企业往往会根据业务需要使用Kubernetes搭建多个集群进行容器的编排管理与资源分配. 为实时监控多个集群的工作状态与资源使用情况, 提出了面向Kubernetes的多集群资源监控方案, 对Kubernetes提供的CPU、内存、网络以及存储指标进行采集, 根据采集数据的类型对部分数据进行计算以获取更直观的监控指标, 实现了多层级多类型的存储, 并提供监控数据的REST接口. 通过实验, 验证了本设计对集群资源的消耗低, 具有较好的性能.  相似文献   

10.
海量地形数据的存储与管理是大规模地形实时漫游系统的关键。该文提出一种基于对象存储的分布式并行地形数据服务系统(DPTSS),采用自治的存储对象存储和管理地形块数据,实现了控制路径和数据路径分离。通过元数据集群提供高效率和高可用的元数据服务,以及基于对象的存储集群实现并行的地形数据块传输服务,提供高吞吐率和高带宽的地形数据服务。对比实验表明,DPTSS在较低的TCO情况下能提供高性能的地形数据服务。  相似文献   

11.
A hybrid clustering procedure for concentric and chain-like clusters   总被引:1,自引:0,他引:1  
K-means algorithm is a well known nonhierarchical method for clustering data. The most important limitations of this algorithm are that: (1) it gives final clusters on the basis of the cluster centroids or the seed points chosen initially, and (2) it is appropriate for data sets having fairly isotropic clusters. But this algorithm has the advantage of low computation and storage requirements. On the other hand, hierarchical agglomerative clustering algorithm, which can cluster nonisotropic (chain-like and concentric) clusters, requires high storage and computation requirements. This paper suggests a new method for selecting the initial seed points, so that theK-means algorithm gives the same results for any input data order. This paper also describes a hybrid clustering algorithm, based on the concepts of multilevel theory, which is nonhierarchical at the first level and hierarchical from second level onwards, to cluster data sets having (i) chain-like clusters and (ii) concentric clusters. It is observed that this hybrid clustering algorithm gives the same results as the hierarchical clustering algorithm, with less computation and storage requirements.  相似文献   

12.
非易失性内存(non-volatile memory,NVM)是近几年来出现的一种新型存储介质.一方面,同传统的易失性内存一样,它有着低访问延迟、可字节寻址的特性;另一方面,与易失性内存不同的是,掉电后它存储的数据不会丢失,此外它还有着更高的密度以及更低的能耗开销.这些特性使得非易失性内存有望被大规模应用在未来的计算机系统中.非易失性内存的出现为构建高效的持久化索引提供了新的思路.由于非易失性硬件还处于研究阶段,因此大多数面向非易失性内存的索引研究工作基于模拟环境开展.在2019年4月英特尔发布了基于3D-XPoint技术的非易失性内存硬件apache pass(AEP),这使得研究人员可以基于真实的硬件环境去进行相关研究工作.首先评测了真实的非易失性内存器件,结果显示AEP的写延迟接近DRAM,而读延迟是DRAM的3~4倍.基于对硬件的实际评测结果,研究发现过去很多工作对非易失性内存的性能假设存在偏差,这使得过去的一些工作大多只针对写性能进行优化,并没有针对读性能进行优化.因此,重新审视了之前研究工作,针对过去的混合索引工作进行了读优化.此外,还提出了一种基于混合内存的异步缓存方法.实验结果表明,经过异步缓存方法优化后的混合索引读性能是优化前的1.8倍,此外,经过异步缓存优化后的持久化索引最多可以降低50%的读延迟.  相似文献   

13.
崔玉龙  付国  张岩峰  于戈 《软件学报》2023,34(5):2427-2445
作为具备高性能和高可伸缩性的分布式存储解决方案,键值存储系统近年来被广泛采用,例如Redis、MongoDB、Cassandra等.分布式存储系统中广泛使用的多副本机制一方面提高了系统吞吐量和可靠性,但同时也增加了系统协调和副本一致性的额外开销.对于跨域分布式系统来说,远距离的副本协调开销甚至可能成为系统的性能瓶颈,降低系统的可用性和吞吐量.提出分布式键值存储系统Elsa,这是一种面向跨区域架构的无协调键值存储系统. Elsa在保证高性能和高可拓展性的基础上,采用无冲突备份数据结构(CRDT)技术来无协调的保证副本间的强最终一致性,降低了系统节点间的协调开销.在阿里云上构建了跨4数据中心8节点的跨区域分布式环境,进行了大规模分布式性能对比实验,实验结果表明:在跨域的分布式环境下,对于高并发争用的负载, Elsa系统的性能具备明显的优势,最高达到MongoDB集群的7.37倍, Cassandra集群的1.62倍.  相似文献   

14.
杨术  陈子腾  崔来中  明中行  程路  唐小林  萧伟 《软件学报》2021,32(12):3945-3959
随着大数据、机器学习等技术的发展,网络流量与任务的计算量也随之快速增长.研究人员提出了内容分发网络(CDN)、边缘计算等平台技术,但CDN只能解决数据存储,而边缘计算存在着难以管理和不能跨集群进行资源调度等问题.容器化技术广泛应用在边缘计算场景中,但目前,边缘计算采取的容器编排策略普遍比较低效,导致任务的计算延迟仍然过长.提出了功能分发网络FDN (function delivery network),一方面为用户提供了访问边缘计算资源的统一接口和容器化的计算平台,无需进行繁琐的计算资源配置;另一方面,FDN平台优化系统的资源利用和任务的计算延迟,能将任务所需的容器编排到合适的边缘计算集群.开发了一种基于启发式的容器编排策略,实现了跨集群的容器编排功能,进一步优化了执行的计算延迟.基于Openwhisk软件实现了FDN,并在中国移动的网络中部署了该系统,而且对FDN和容器编排策略进行测试.实验结果表明,FDN计算平台能够降低任务的计算延迟;同时,启发式容器编排策略的性能相比传统的算法有了较大的提升.  相似文献   

15.
Within the past few years, workstation clusters have gained an increasing importance as platforms for parallel high-performance simulation problems. In contrast to the specialized and cost-intensive interconnection network of distributed memory multiprocessor systems, workstation clusters utilize local area networks (LANs) and common communication protocols. Therefore, the cost-efficiency of workstation clusters for parallel tasks is high while the communication performance is limited compared to parallel computer systems. To improve the communication performance of clusters, new protocols can be applied as well as specialized interconnection networks. On one hand, these solutions decrease the cost-efficiency of clusters, while on the other hand the performance of local area networks is increasing because of new technologies such as FastEthernet, GigaEthernet or ATM. In this contribution, we propose increasing the communication performance of clusters through the concurrent network architecture (CNA) with multi-channel communication systems. Through the use of parallel and independent LANs, the communication performance of a cluster can be improved while maintaining the cost-efficiency of the wide-spread LAN technology and protocols. This paper gives an overview of the CNA, the requirements of an implementation and a performance evaluation of a CNA workstation cluster.  相似文献   

16.
可能性聚类有两大缺陷:一致聚类中心问题和有效性指标失效问题。对于第一个问题,有人提出在目标函数中添加聚类中心排斥项,但这样会引入更多的参数。为此,本文提出了一种改进的可能性聚类算法,较好地解决了这个问题。对于第二个问题,本文通过对隶属度作适当变换,使修正的有效性指标适用于可能性聚类。实验结果表明,该算法的优越性明显,有效性指标估计更为准确。  相似文献   

17.
The efficiency of the k-Nearest Neighbour classifier depends on the size of the training set as well as the level of noise in it. Large datasets with high level of noise lead to less accurate classifiers with high computational cost and storage requirements. The goal of editing is to improve accuracy by improving the quality of the training datasets. To obtain such datasets, editing removes noise and mislabeled data as well as smooths the decision boundaries between the discrete classes. On the other hand, prototype abstraction aims to reduce the computational cost and the storage requirements of classifiers by condensing the training data. This paper proposes an editing algorithm called Editing through Homogeneous Clusters (EHC). Then, it extends the idea by introducing a prototype abstraction algorithm that integrate the EHC mechanism and is capable of creating a small noise-free representative set of the initial training data. This algorithm is called Editing and Reduction through Homogeneous Clusters (ERHC). Both are based on a fast and parameter free iterative execution of k-means clustering that forms homogeneous clusters. Both consider as noise and remove clusters consisting of a single item. In addition, ERHC summarizes the items of the remaining clusters by storing the mean item for each one in the representative set. EHC and ERHC are tested on several datasets. The results show that both run very fast and achieve high accuracy. In addition, ERHC achieves high reduction rates.  相似文献   

18.
Data-intensive applications that are inherently I/O bound have become a major workload on traditional high-performance computing (HPC) clusters. Simply employing data-intensive computing storage such as HDFS or using parallel file systems available on HPC clusters to serve such applications incurs performance and scalability issues. In this paper, we present a novel two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We build a two-level storage system prototype with Tachyon and OrangeFS, and analyze the resulting I/O throughput for typical MapReduce operations. Theoretical modeling and experiments show that the proposed two-level storage delivers higher aggregate I/O throughput than HDFS and OrangeFS and achieves scalable performance for both read and write. We expect this two-level storage approach to provide insights on system design for big data analytics on HPC clusters.  相似文献   

19.
王文奎  吴国新 《微机发展》2008,18(4):236-238
一方面信息量的爆炸式增长对传统的存储模型提出了严峻挑战,另一方面宽带广泛普及、计算机性能大幅度提升,在此背景下,以Pber-to-Peer的方式对存储资源进行管理成为可能。提出了系统的体系结构和模块组织,重点探讨存储资源的发布与发现以及数据的存取与维护,通过覆盖网的路由和定位机制均衡存储管理负载,通过数据分片与剐本机制增加系统可靠性,并对系统的设计与实现要点进行分析。结果表明对等式存储系统是可行的。  相似文献   

20.
Powerful storage, high performance and scalability are the most important issues for analytical databases. These three factors interact with each other, for example, powerful storage needs less scalability but higher performance, high performance means less consumption of indexes and other materializations for storage and fewer processing nodes, larger scale relieves stress on powerful storage and the high performance processing engine. Some analytical databases (ParAccel, Teradata) bind their performance with advanced hardware supports, some (Asterdata, Greenplum) rely on the high scalability framework of MapReduce, some (MonetDB, Sybase IQ, Vertica) highlight performance on processing engine and storage engine. All these approaches can be integrated into an storage-performance-scalability (S-P-S) model, and future large scale analytical processing can be built on moderate clusters to minimize expensive hardware dependency. The most important thing is a simple software framework is fundamental to maintain pace with the development of hardware technologies. In this paper, we propose a schema-aware on-line analytical processing (OLAP) model with deep optimization from native features of the star or snowflake schema. The OLAP model divides the whole process into several stages, each stage pipes its output to the next stage, we minimize the size of output data in each stage, whether in central processing or clustered processing. We extend this mechanism to cluster processing using two major techniques, one is using NetMemory as a broadcasting protocol based dimension mirror synchronizing buffer, the other is predicate-vector based DDTA-OLAP cluster model which can minimize the data dependency of star-join using bitmap vectors. Our OLAP model aims to minimize network transmission cost (MiNT in short) for OLAP clusters and support a scalable but simple distributed storagemodel for large scale clustering processing. Finally, the experimental results show the speedup and scalability performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号