首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Massive computation power and storage capacity of cloud computing systems allow scientists to deploy data-intensive applications without the infrastructure investment, where large application datasets can be stored in the cloud. Based on the pay-as-you-go model, data placement strategies have been developed to cost-effectively store large volumes of generated datasets in the scientific cloud workflows. As promising as it is, this paradigm also introduces many new challenges for data security when the users outsource sensitive data for sharing on the cloud servers, which are not within the same trusted domain as the data owners. This challenge is further complicated by the security constraints on the potential sensitive data for the scientific workflows in the cloud. To effectively address this problem, we propose a security-aware intermediate data placement strategy. First, we build a security overhead model to reasonably measure the security overheads incurred by the sensitive data. Second, we develop a data placement strategy to dynamically place the intermediate data for the scientific workflows. Finally, our experimental results show that our strategy can effectively improve the intermediate data security while ensuring the data transfer time during the execution of scientific workflows.  相似文献   

2.
Security is increasingly critical for various scientific workflows that are big data applications and typically take quite amount of time being executed on large-scale distributed infrastructures. Cloud computing platform is such an infrastructure that can enable dynamic resource scaling on demand. Nevertheless, based on pay-per-use and hourly-based pricing model, users should pay attention to the cost incurred by renting virtual machines (VMs) from cloud data centers. Meanwhile, workflow tasks are generally heterogeneous and require different instance series (i.e., computing optimized, memory optimized, storage optimized, etc.). In this paper, we propose a security and cost aware scheduling (SCAS) algorithm for heterogeneous tasks of scientific workflow in clouds. Our proposed algorithm is based on the meta-heuristic optimization technique, particle swarm optimization (PSO), the coding strategy of which is devised to minimize the total workflow execution cost while meeting the deadline and risk rate constraints. Extensive experiments using three real-world scientific workflow applications, as well as CloudSim simulation framework, demonstrate the effectiveness and practicality of our algorithm.  相似文献   

3.
Workflows are used to orchestrate data-intensive applications in many different scientific domains. Workflow applications typically communicate data between processing steps using intermediate files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. As a result, the efficient management of data is a key factor in achieving good performance for workflow applications in distributed environments. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon’s EC2 cloud computing platform. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.  相似文献   

4.
An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows.  相似文献   

5.
袁友伟  鲍泽前  俞东进  李万清 《软件学报》2018,29(11):3326-3339
针对现有云环境下的多科学工作流调度算法中存在的未考虑安全调度问题,提出多科学工作流安全-时间约束费用优化算法MSW-SDCOA(Multi-Scientific Workflows Security-Deadline constraint Cost OptimizationAlgorithm).首先MSW-SDCOA基于数据依赖关系压缩科学工作流,减少任务节点数从而节省了调度开销;并通过改进HEFT(Heterogeneous Earliest-Finish-Time)算法形成调度序列,以实现全局多目标优化调度;最后,通过优化ACO(Ant Colony Optimization)中信息素更新策略和启发式信息,进一步改善费用优化效果.仿真实验表明,MSW-SDCOA算法在费用优化效果上比MW-DBS算法提高了约14%.  相似文献   

6.
针对在不可信的云存储中,数据的机密性得不到保证的情况,提出一种新的代理重加密(PRE)算法,并将其应用于云存储访问控制方案中,该方案将一部分密文存储云中共享,另一部分密文直接发送给用户。证明了该访问控制方案在第三方的不可信任的开放环境下云存储中敏感数据的机密性。通过分析对比,结果表明:发送方对密文的传递可控,该方案利用代理重加密的性质,在一对多的云存储访问控制方案中,密文运算量和存储不会随着用户的增长而呈线性增长,显著降低了通信过程中数据运算量和交互量,有效减少数据的存储空间。该方案实现了云存储中敏感数据的安全高效共享。  相似文献   

7.
科学工作流应用是一种复杂且数据密集型的应用,常应用于结构生物学、高能物理学和神经学等涉及分布式数据源的学科。数据分散存储在基于互联网的云计算平台上,致使科学工作流在执行时伴随着大量的数据传输。云计算是一种按使用量付费的模式,数据传输产生传输费用,尤其在多个工作流相互协同的情况下,将产生更高的传输成本。该文从全局的角度建立基于多工作流数据依赖图的传输成本模型,研究基于二进制粒子群算法(BPSO)的数据布局优化策略,从而减少对云计算传输资源的租赁费用。  相似文献   

8.
In the last years, scientific workflows have emerged as a fundamental abstraction for structuring and executing scientific experiments in computational environments. Scientific workflows are becoming increasingly complex and more demanding in terms of computational resources, thus requiring the usage of parallel techniques and high performance computing (HPC) environments. Meanwhile, clouds have emerged as a new paradigm where resources are virtualized and provided on demand. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines. Although the initial focus of clouds was to provide high throughput computing, clouds are already being used to provide an HPC environment where elastic resources can be instantiated on demand during the course of a scientific workflow. However, this model also raises many open, yet important, challenges such as scheduling workflow activities. Scheduling parallel scientific workflows in the cloud is a very complex task since we have to take into account many different criteria and to explore the elasticity characteristic for optimizing workflow execution. In this paper, we introduce an adaptive scheduling heuristic for parallel execution of scientific workflows in the cloud that is based on three criteria: total execution time (makespan), reliability and financial cost. Besides scheduling workflow activities based on a 3-objective cost model, this approach also scales resources up and down according to the restrictions imposed by scientists before workflow execution. This tuning is based on provenance data captured and queried at runtime. We conducted a thorough validation of our approach using a real bioinformatics workflow. The experiments were performed in SciCumulus, a cloud workflow engine for managing scientific workflow execution.  相似文献   

9.
Bag-of-Tasks (BoT) workflows are widespread in many big data analysis fields. However, there are very few cloud resource provisioning and scheduling algorithms tailored for BoT workflows. Furthermore, existing algorithms fail to consider the stochastic task execution times of BoT workflows which leads to deadline violations and increased resource renting costs. In this paper, we propose a dynamic cloud resource provisioning and scheduling algorithm which aims to fulfill the workflow deadline by using the sum of task execution time expectation and standard deviation to estimate real task execution times. A bag-based delay scheduling strategy and a single-type based virtual machine interval renting method are presented to decrease the resource renting cost. The proposed algorithm is evaluated using a cloud simulator ElasticSim which is extended from CloudSim. The results show that the dynamic algorithm decreases the resource renting cost while guaranteeing the workflow deadline compared to the existing algorithms.  相似文献   

10.
尚蕾  刘茜萍 《计算机工程》2020,46(5):122-130,138
云环境下科学工作流的数据布局成为当前工作流研究中的一个热点问题,对科学工作流中任务和数据之间多对多关系进行分析,可以发现不同数据布局方案在数据传输上的费用各不相同,在很大程度上影响工作流的运行成本。为降低科学工作流数据集传输费用,提出一种基于任务分配和数据集副本的科学工作流数据布局方法。该方法从任务分配开始,在定量计算任务依赖度的基础上进行任务分配,根据分配结果给出一个基于数据集副本的两阶段数据布局方法,以实现科学工作流运行中传输费用的优化。实例结果表明,与工作流层方法相比,该方法可以有效降低科学工作流的运行成本。  相似文献   

11.
Cloud computing has established itself as an interesting computational model that provides a wide range of resources such as storage, databases and computing power for several types of users. Recently, the concept of cloud computing was extended with the concept of federated clouds where several resources from different cloud providers are inter-connected to perform a common action (e.g. execute a scientific workflow). Users can benefit from both single-provider and federated cloud environment to execute their scientific workflows since they can get the necessary amount of resources on demand. In several of these workflows, there is a demand for high performance and parallelism techniques since many activities are data and computing intensive and can execute for hours, days or even weeks. There are some Scientific Workflow Management Systems (SWfMS) that already provide parallelism capabilities for scientific workflows in single-provider cloud. Most of them rely on creating a virtual cluster to execute the workflow in parallel. However, they also rely on the user to estimate the amount of virtual machines to be allocated to create this virtual cluster. Most SWfMS use this initial virtual cluster configuration made by the user for the entire workflow execution. Dimensioning the virtual cluster to execute the workflow in parallel is then a top priority task since if the virtual cluster is under or over dimensioned it can impact on the workflow performance or increase (unnecessarily) financial costs. This dimensioning is far from trivial in a single-provider cloud and specially in federated clouds due to the huge number of virtual machine types to choose in each location and provider. In this article, we propose an approach named GraspCC-fed to produce the optimal (or near-optimal) estimation of the amount of virtual machines to allocate for each workflow. GraspCC-fed extends a previously proposed heuristic based on GRASP for executing standalone applications to consider scientific workflows executed in both single-provider and federated clouds. For the experiments, GraspCC-fed was coupled to an adapted version of SciCumulus workflow engine for federated clouds. This way, we believe that GraspCC-fed can be an important decision support tool for users and it can help determining an optimal configuration for the virtual cluster for parallel cloud-based scientific workflows.  相似文献   

12.
Efficient data-aware methods in job scheduling, distributed storage management and data management platforms are necessary for successful execution of data-intensive applications. However, research about methods for data-intensive scientific applications are insufficient in large-scale distributed cloud and cluster computing environments and data-aware methods are becoming more complex. In this paper, we propose a Data-Locality Aware Workflow Scheduling (D-LAWS) technique and a locality-aware resource management method for data-intensive scientific workflows in HPC cloud environments. D-LAWS applies data-locality and data transfer time based on network bandwidth to scientific workflow task scheduling and balances resource utilization and parallelism of tasks at the node-level. Our method consolidates VMs and consider task parallelism by data flow during the planning of task executions of a data-intensive scientific workflow. We additionally consider more complex workflow models and data locality pertaining to the placement and transfer of data prior to task executions. We implement and validate the methods based on fairness in cloud environments. Experimental results show that, the proposed methods can improve performance and data-locality of data-intensive workflows in cloud environments.  相似文献   

13.
Ceph分布式存储系统正成为广泛使用的开源云环境存储解决方案。异构存储如果应用有效的数据管理策略,则能够在保持低成本的同时提供大容量和高性能存储。在Ceph中使用异构存储设备不能有效发挥异构存储设备的性能,由于数据的多个副本可以存放到不同的存储介质中,因此不同的副本组合的性能和成本都不一样。针对Ceph提出一种面向异构存储的数据放置方法,通过划分多种不同的副本组合,根据数据热度和读写比例将不同的数据放到不同的副本组合上,在提升系统性能的同时有效地控制了系统容量成本。  相似文献   

14.
Cloud computing, an important source of computing power for the scientific community, requires enhanced tools for an efficient use of resources. Current solutions for workflows execution lack frameworks to deeply analyze applications and consider realistic execution times as well as computation costs. In this study, we propose cloud user–provider affiliation (CUPA) to guide workflow’s owners in identifying the required tools to have his/her application running. Additionally, we develop PSO-DS, a specialized scheduling algorithm based on particle swarm optimization. CUPA encompasses the interaction of cloud resources, workflow manager system and scheduling algorithm. Its featured scheduler PSO-DS is capable of converging strategic tasks distribution among resources to efficiently optimize makespan and monetary cost. We compared PSO-DS performance against four well-known scientific workflow schedulers. In a test bed based on VMware vSphere, schedulers mapped five up-to-date benchmarks representing different scientific areas. PSO-DS proved its efficiency by reducing makespan and monetary cost of tested workflows by 75 and 78%, respectively, when compared with other algorithms. CUPA, with the featured PSO-DS, opens the path to develop a full system in which scientific cloud users can run their computationally expensive experiments.  相似文献   

15.
非结构化数据呈爆炸态势增长, 传统存储技术在吞吐能力可扩展性及易管理性等方面急需改进, 通过分析安保视频数据存储的问题, 设计一种云计算架构下的安保视频监控存储系统, 基于框架技术搭建了对等架构的云计算环境, 并对其中的云存储策略进行了设计和建模. 实现在廉价不可信节点上存储海量私有化只读视频数据, 并提供高效可靠地访问. 仿真结果显示, 系统的存储性能可靠度高且易于扩展, 可提供效能较高的视频云存储服务.  相似文献   

16.
A growing number of data- and compute-intensive experiments have been modeled as scientific workflows in the last decade. Meanwhile, clouds have emerged as a prominent environment to execute this type of workflows. In this scenario, the investigation of workflow scheduling strategies, aiming at reducing its execution times, became a top priority and a very popular research field. However, few work consider the problem of data file assignment when solving the task scheduling problem. Usually, a workflow is represented by a graph where nodes represent tasks and the scheduling problem consists in allocating tasks to machines to be executed at a predefined time aiming at reducing the makespan of the whole workflow. In this article, we show that the scheduling of scientific workflows can be improved when both task scheduling and the data file assignment problems are treated together. Thus, we propose a new workflow representation, where nodes of the workflow graph represent either tasks or data files, and define the Task Scheduling and Data Assignment Problem (TaSDAP), considering this new model. We formulated this problem as an integer programming problem. Moreover, a hybrid evolutionary algorithm for solving it, named HEA-TaSDAP, is also introduced. To evaluate our approach we conducted two types of experiments: theoretical and practical ones. At first, we compared HEA-TaSDAP with the solutions produced by the mathematical formulation and by other works from related literature. Then, we considered real executions in Amazon EC2 cloud using a real scientific workflow use case (SciPhy for phylogenetic analyses). In all experiments, HEA-TaSDAP outperformed the other classical approaches from the related literature, such as Min–Min and HEFT.  相似文献   

17.
唐兵  张黎 《计算机应用》2014,34(11):3109-3111
为提高云存储的访问速率并降低费用,提出了一种面向费用优化的云存储缓存策略。利用几乎免费的局域网环境下的多台桌面计算机,在本地建立一个分布式文件系统,并将其作为远端云存储的缓存。进行文件读取时,首先查找其是否在缓存中,若存在则直接从缓存读取;若不存在则从远端云存储读取。采用了最近最少使用(LRU)算法进行缓存替换,将冷门数据从缓存中替换掉。以亚马逊简单存储服务(S3)作为远端的云存储服务,对原型系统进行了简单的性能测试。测试结果表明,使用了所提出的缓存策略后,在降低费用的同时能够显著提高文件读取的速度。  相似文献   

18.
李学俊  吴洋  刘晓  程慧敏  朱二周  杨耘 《软件学报》2016,27(7):1861-1875
科学工作流是一种复杂的数据密集型应用程序.如何在混合云环境中对数据进行有效布局,是科学工作流所面临的重要问题,尤其是混合云的安全性要求给科学云工作流数据布局研究带来了新的挑战.传统数据布局方法大多采用基于负载均衡的划分模型布局数据集,该方法可以获得很好的负载平衡布局,然而传输时间并非最优.针对传统数据布局方法的不足,并结合混合云中数据布局的特点,首先设计一种基于数据依赖破坏度的矩阵划分模型,生成对数据依赖度破坏最小的划分;然后提出一种面向数据中心的数据布局方法,该方法依据划分模型将依赖度高的数据集尽量放在同一数据中心,从而减少数据集跨数据中心的传输时间.实验结果表明,该方法能够有效地缩短科学工作流运行时跨数据中心的数据传输时间.  相似文献   

19.
陈亮  杨庚  屠袁飞 《计算机应用》2016,36(7):1822-1827
针对现有云存储的数据和访问控制的安全性不高,从而造成用户存储的敏感信息被盗取的现象,结合现有的基于密文策略属性加密(CP-ABE)方案和数据分割的思想,提出了一个基于混合云的高效数据隐私保护模型。首先根据用户数据的敏感程度将数据合理分割成不同敏感级别的数据块,将分割后的数据存储在不同的云平台上,再根据数据的安全级别,进行不同强度的加密技术进行数据加密。同时在敏感信息解密阶段采取“先匹配后解密”的方法,并对算法进行了优化,最后用户进行一个乘法运算解密得到明文。在公有云中对1 Gb数据进行对称加密,较单节点提高了效率一倍多。实验结果表明:该方案可以有效保护云存储用户的隐私数据,同时降低了系统的开销,提高了灵活性。  相似文献   

20.
Scientific workflows have become a standardized way for scientists to represent a set of tasks to overcome/solve a certain scientific problem. Usually these workflows consist of numerous CPU and I/O-intensive jobs that are executed using workflow management systems (WfMS), on clouds, grids, supercomputers, etc. Previously, it was shown that using k-way partitioning to distribute a workflow’s tasks between multiple machines in the cloud reduces the overall data communication and therefore lowers the cost of the bandwidth usage. A framework was built to automate this process of partitioning and execution of any workflow submitted by a scientist that is meant to be run on Pegasus WfMS, in the cloud, with ease. The framework provisions the instances in the cloud using CloudML, configures and installs all the software needed for the execution, partitions and runs the provided scientific workflow, also showing the estimated makespan and cost.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号