首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 144 毫秒
1.
数据和计算密集混合元任务的网格调度算法   总被引:4,自引:0,他引:4  
网格计算技术是继Internet计算之后出现的新兴研究领域。网格系统由异构的资源组成,一个好的任务调度方法可以充分利用网格系统的处理能力,减少任务的完成时间。根据目前网格系统的使用模式,提出了符合实际的用户任务形式,即任务由数据传输和计算两部分组成,计算在获得所有输入之后开始执行。多个这样的独立任务组成元任务,作为调度程序的最小执行单位。在实际应用中,元任务应该由数据密集型和计算密集型任务混合组成。考虑到数据传输和计算的比例关系对元任务完成的影响,提出一种新的调度算法TCR,通过提高计算资源的利用率以及任务间的并行度,减少元任务的完成时间。详细介绍了该算法,并通过模拟结果的对比验证了该算法的良好性能。  相似文献   

2.
针对基于时间和预算限制的资源调度算法在调度数据密集型应用程序时存在的问题,提出一种新的基于通信代价的网格资源调度算法,综合考虑用户的时问限制和预算要求,根据用户作业的计算量与通信量选择具有一定计算能力,且通信代价较小的资源节点作为目标节点,通过减少此类程序提交到目标资源节点的通信代价,达到减少整个应用程序完成时间的目的。实验结果表明,该算法能够获得较好的性能。  相似文献   

3.
网格环境下资源的管理和调度是一个非常复杂且具有挑战性的问题。在数据密集型应用中,数据文件的读取延迟时间是至关重要的。提出了一种基于聚类预处理的数据文件复制算法(CBR),将传输带宽满足一定条件的网格结点通过聚类方法构成一个“逻辑区域”;并介绍了一种改进的LRU算法,考虑了其他计算任务需要的数据文件请求,避免删除未来将使用的数据文件。通过实验证明,该算法得到的计算任务完成时间优于其他两种算法。  相似文献   

4.
褚瑞  卢锡城  肖侬 《软件学报》2006,17(11):2234-2244
内存网格(RAM(random access memory) grid)是一种面向广域网上内存资源共享的新型网格系统.它的主要目标是在物理内存不足的情况下,提高内存密集型应用或IO密集型应用的系统性能.内存网格的应用效果取决于网络通信开销.在减少或隐藏网络通信开销的情况下,其性能可以进一步提高.通过对内存网格的分析,设计了一种基于"推"数据的内存网格预取机制.借助数据挖掘领域中序列模式挖掘的方法,提出了相应的预取算法.通过基于真实运行状态的模拟,对预取算法进行了评估和验证.  相似文献   

5.
针对数据密集型应用的调度问题,提出一种新的调度算法,在选择文件传输节点的同时考虑网络带宽和节点的信任度.针对传输文件时带来的传输节点负载不均的现象,采用基于sufferage思想的算法均衡负载.最后,通过实验证明该算法优于传统的Min Min算法.  相似文献   

6.
胡明  曾联明 《现代计算机》2010,(7):16-19,23
针对SVM分类过程中,处理大规模训练样本集遇到的因样本维度高、消耗大量内存导致分类效率低下的问题,提出基于网格环境的计算策略。该策略针对密集型计算问题分别提出按步骤、按功能、按数据进行任务分解的三种解决方案。用户根据SVM样本训练和分类的实际来选择使用哪一种方案。对遥感数据分别在单机环境和网格环境的对比实验表明,能够提高训练和分类速度,在计算环境的层面弥补处理大规模数据对计算性能的高要求。  相似文献   

7.
罗泽  崔辰州  南凯  阎保平 《软件学报》2005,16(8):1465-1473
许多计算或数据密集型的科学应用要求一种共享和协同使用分布异构资源的机制,作为其复杂的问题求解过程的一部分.网格环境下银河系化学演化研究是中国虚拟天文台应用系统的一个重要部分.详细描述了该演化研究的设计和实现.通过这个范例,提出一种共享和协同使用资源的机制.该机制通过在网格环境中集成网格服务和网格服务工作流,能够有效地支持e-Science应用.该机制对于建模和管理实验研究、数据积累和科研成果的消化吸收这样的科研过程是有益的.  相似文献   

8.
基于设备网格环境中仪器设备的利用率和提交任务的QoS需求来考虑,结合任务调度算法Min-min,提出了一种设备网格中的Qos-Balance任务调度算法.该算法既保证了负载均衡性和又可满足提交任务的QoS需求.实验结果表明,该算法是一种可行的设备网格任务调度算法.最后介绍了算法实验的结果分析.  相似文献   

9.
针对大数据时代,数据密集型计算已经成为国内外的一个研究热点. 遥感数据具有多源化、海量化特点,是名副其实的大数据. 研究适用于遥感影像自动化、业务化处理的数据密集型计算方法,是目前遥感应用技术面临的挑战所面临的挑战,本文提出了一种基于数据密集型计算的遥感图像处理方法. 在文中,首先围绕遥感数据自动化、业务化预处理等问题,深入调查和分析了国内外研究现状,进而介绍了系统体系结构,通过工作流灵活组织多种算法模型协同工作,设计以“5并行1加速”的计算体系解决数据密集型的遥感图像预处理,并通过产品生产实例对其性能进行测试. 结果表明,该系统在保证处理精度的前提下,大大提高了遥感大数据预处理的效率.  相似文献   

10.
在服务网格中为用户提供满足SLA(service level agreement)的服务,是实现网格“非凡的服务质量”的一个重要的研究问题。本文提出了网格本地资源基于任务SLA的调度算法,给出了算法的数学模型和描述。在基于Java的网格环境调度模拟器中对算法进行了验证,该算法能实现满足用户SLA约束的调度,为满足全局的服务质量水平提供本地调度支持,对提高网格服务质量水平具有实际意义。  相似文献   

11.
Efficient data-aware methods in job scheduling, distributed storage management and data management platforms are necessary for successful execution of data-intensive applications. However, research about methods for data-intensive scientific applications are insufficient in large-scale distributed cloud and cluster computing environments and data-aware methods are becoming more complex. In this paper, we propose a Data-Locality Aware Workflow Scheduling (D-LAWS) technique and a locality-aware resource management method for data-intensive scientific workflows in HPC cloud environments. D-LAWS applies data-locality and data transfer time based on network bandwidth to scientific workflow task scheduling and balances resource utilization and parallelism of tasks at the node-level. Our method consolidates VMs and consider task parallelism by data flow during the planning of task executions of a data-intensive scientific workflow. We additionally consider more complex workflow models and data locality pertaining to the placement and transfer of data prior to task executions. We implement and validate the methods based on fairness in cloud environments. Experimental results show that, the proposed methods can improve performance and data-locality of data-intensive workflows in cloud environments.  相似文献   

12.
The recent data deluge needing to be processed represents one of the major challenges in the computational field. This fact led to the growth of specially-designed applications known as data-intensive applications. In general, in order to ease the parallel execution of data-intensive applications input data is divided into smaller data chunks that can be processed separately. However, in many cases, these applications show severe performance problems mainly due to the load imbalance, inefficient use of available resources, and improper data partition policies. In addition, the impact of these performance problems can depend on the dynamic behavior of the application.This work proposes a methodology to dynamically improve the performance of data-intensive applications based on: (i) adapting the size and the number of data partitions to reduce the overall execution time; and (ii) adapting the number of processing nodes to achieve an efficient execution. We propose to monitor the application behavior for each exploration (query) and use gathered data to dynamically tune the performance of the application. The methodology assumes that a single execution includes multiple related queries on the same partitioned workload.The adaptation of the workload partition factor is addressed through the definition of the initial size for the data chunks; the modification of the scheduling policy to send first data chunks with large processing times; dividing of the data chunks with the biggest associated computation times; and joining of data chunks with small computation times. The criteria for dividing or gathering chunks are based on the chunks’ associated execution time (average and standard deviation) and the number of processing elements being used. Additionally, the resources utilization is addressed through the dynamic evaluation of the application performance and the estimation and modification of the number of processing nodes that can be efficiently used.We have evaluated our strategy using as cases of study a real and a synthetic data-intensive application. Analytical expressions have been analyzed through simulation. Applying our methodology, we have obtained encouraging results reducing total execution times and efficient use of resources.  相似文献   

13.
Providing QoS for big data applications requires a way to reserve computing and networking resources in advance. Within advance reservation framework, a multi-domain scheduling process is carried out in a top down hierarchical way across multiple hierarchical levels. This ensures that each domain executes intra-domain scheduling algorithm to co-schedule its own computing and networking resources while coordinating the scheduling at the inter-domain level. Within this process, we introduce two algorithms: iterative scheduling algorithm and K-shortest paths algorithm. We conducted a comprehensive performance evaluation study considering several metrics that reflect both grid system and grid user goals. The results demonstrated the advantages of the proposed scheduling process. Moreover, the results highlight the importance of using the iterative scheduling and K-shortest paths algorithms especially for data intensive applications.  相似文献   

14.
In grid networks, distributed resources, computing or storage elements as well as scientific instruments are interconnected to support computing-intensive and data-intensive applications. To facilitate the efficient scheduling of these resources, we propose to manage the movements of massive data set between them. This paper formulates the bulk data transfer scheduling problem and presents an optimal solution to minimize the network congestion factor of a dedicated network or an isolated traffic class. The solution satisfying individual flows’ time and volume constraints can be found in polynomial time and expressed as a set of multi-interval bandwidth allocation profiles. To ensure a large-scale deployment of this approach, we propose, for the data plane, a combination of a bandwidth profile enforcement mechanism with traditional transport protocols. The paper examines several solutions for implementing such a mechanism in a Linux kernel. The experimental evaluation shows that packet pacing performed at IP level offers a simple yet valuable and TCP-compatible solution for accurate bandwidth profile enforcement at very high speed.  相似文献   

15.
The Grid is an infrastructure for resource sharing and coordinated use of those resources in dynamic heterogeneous distributed environments. The effective use of a Grid requires the definition of metadata for managing the heterogeneity of involved resources that include computers, data, network facilities, and software tools provided by different organizations. Metadata management becomes a key issue when complex applications, such as data-intensive simulations and data mining applications, are executed on a Grid. This paper discusses metadata models for heterogeneous resource management in Grid-based data mining applications. In particular, it discusses how resources are represented and managed in the Knowledge Grid, a framework for Grid-enabled distributed data mining. The paper illustrates how XML-based metadata is used to describe data mining tools, data sources, mining models, and execution plans, and how metadata is used for the design and execution of distributed knowledge discovery applications on Grids.  相似文献   

16.
A PTS-PGATS based approach for data-intensive scheduling in data grids   总被引:1,自引:0,他引:1  
Grid computing is the combination of computer resources in a loosely coupled, heterogeneous, and geographically dispersed environment. Grid data are the data used in grid computing, which consists of large-scale data-intensive applications, producing and consuming huge amounts of data, distributed across a large number of machines. Data grid computing composes sets of independent tasks each of which require massive distributed data sets that may each be replicated on different resources. To reduce the completion time of the application and improve the performance of the grid, appropriate computing resources should be selected to execute the tasks and appropriate storage resources selected to serve the files required by the tasks. So the problem can be broken into two sub-problems: selection of storage resources and assignment of tasks to computing resources. This paper proposes a scheduler, which is broken into three parts that can run in parallel and uses both parallel tabu search and a parallel genetic algorithm. Finally, the proposed algorithm is evaluated by comparing it with other related algorithms, which target minimizing makespan. Simulation results show that the proposed approach can be a good choice for scheduling large data grid applications.  相似文献   

17.
The Data Grid provides massive aggregated computing resources and distributed storage space to deal with data-intensive applications. Due to the limitation of available resources in the grid as well as production of large volumes of data, efficient use of the Grid resources becomes an important challenge. Data replication is a key optimization technique for reducing access latency and managing large data by storing data in a wise manner. Effective scheduling in the Grid can reduce the amount of data transferred among nodes by submitting a job to a node where most of the requested data files are available. In this paper two strategies are proposed, first a novel job scheduling strategy called Weighted Scheduling Strategy (WSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in a queue, the location of the required data for the job and the computing capacity of the sites Second, a dynamic data replication strategy, called Enhanced Dynamic Hierarchical Replication (EDHR) that improves file access time. This strategy is an enhanced version of the Dynamic Hierarchical Replication strategy. It uses an economic model for file deletion when there is not enough space for the replica. The economic model is based on the future value of a data file. Best replica placement plays an important role for obtaining maximum benefit from replication as well as reducing storage cost and mean job execution time. So, it is considered in this paper. The proposed strategies are implemented by OptorSim, the European Data Grid simulator. Experiment results show that the proposed strategies achieve better performance by minimizing the data access time and avoiding unnecessary replication.  相似文献   

18.
如何对数据密集型工作流应用进行高效合理地调度成为云计算领域亟待解决的关键问题之一。针对此问题,首先构造数据密集型工作流的有向超图模型。然后提出了数据支持能力概念,通过基于数据支持能力的合并操作对模型进行约简。最后优化超图多层剖分算法,提出数据约简的数据密集型工作流调度策略HEFT-P。研究结果表明,HEFT-P相比典型的工作流调度策略HEFT、CPOP、MCP,能够很好地对数据密集型工作流进行约简优化,获得较少的调度时间。  相似文献   

19.
Stream computing applications require minimum latency and high throughput for efficiently processing real-time data. Typically, data-intensive applications where large datasets are required to be moved across execution nodes have low latency requirements. In this paper, a stream-based data processing model is adopted to develop an algorithm for optimal partitioning the input data such that the inter-partition data flow remains minimal. The proposed algorithm improves the execution of the data-intensive workflows in heterogeneous computing environments by partitioning the data-intensive workflow and mapping each partition on the available heterogeneous resources that offer minimum execution time. Minimum data movement between the partitions reduces the latency, which can be further reduced by applying advanced data parallelism techniques. In this paper, we apply data parallelism technique to the bottleneck (most compute-intensive) task in each partition that significantly reduces the latency. We study the effectiveness and the performance of the proposed approach by using synthesized workflows and real-world applications, such as Montage and Cybershake. Our evaluation shows that the proposed algorithm provides schedules with approximately 12% reduced latency and nearly 17% enhanced throughput as compared to the existing state of the art algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号