期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments

Najme MANSOURI 《Frontiers of Computer Science》2014,8(3):391-408

Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage. 相似文献

2.

QoS guided Min-Min heuristic for grid task scheduling 总被引：75，自引：1，他引：74

下载免费PDF全文

何晓珊孙贤和 Gregor von Laszewski 《计算机科学技术学报》2003,18(4):0-0

Task scheduling is an integrated component of computing.With the emergence of Grid and ubiquitous computing,new challenges appear in task scheduling based on properties such as security,quality of service,and lack of central control within distributed administrative domains.A Grid task scheduling framework must be able to deal with these issues.One of the goals of Grid task scheduling is to achivev high system throughput while matching applications with the available computing resources.This matching of resources in a non-deterministically shared heterogeneous environment leads to concerns over Quality of Service (QoS).In this paper a novel QoS guided task scheduling algorithm for Grid computing is introduced.The proposed novel algorithm is based on a general adaptive scheduling heuristics that includes QoS guidance.The algorithm is evaluated within a simulated Grid environment.The experimental results show that the nwe QoS guided Min-Min heuristic can lead to significant performance gain for a variety of applications.The approach is compared with others based on the quality of the prediction formulated by inaccurate information. 相似文献

3.

A Grid‐enabled problem‐solving environment for advanced reservoir uncertainty analysis

Zhou Lei Gabrielle Allen Promita Chakraborty Dayong Huang John Lewis Xin Li Christopher D. White 《Concurrency and Computation》2008,20(18):2123-2140

Uncertainty analysis is critical for conducting reservoir performance prediction. However, it is challenging because it relies on (1) massive modeling‐related, geographically distributed, terabyte, or even petabyte scale data sets (geoscience and engineering data), (2) needs to rapidly perform hundreds or thousands of flow simulations, being identical runs with different models calculating the impacts of various uncertainty factors, (3) an integrated, secure, and easy‐to‐use problem‐solving toolkit to assist uncertainty analysis. We leverage Grid computing technologies to address these challenges. We design and implement an integrated problem‐solving environment ResGrid to effectively improve reservoir uncertainty analysis. The ResGrid consists of data management, execution management, and a Grid portal. Data Grid tools, such as metadata, replica, and transfer services, are used to meet massive size and geographically distributed characteristics of data sets. Workflow, task farming, and resource allocation are used to support large‐scale computation. A Grid portal integrates the data management and the computation solution into a unified easy‐to‐use interface, enabling reservoir engineers to specify uncertainty factors of interest and perform large‐scale reservoir studies through a web browser. The ResGrid has been used in petroleum engineering. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

4.

Analysis of Scheduling and Replica Optimisation Strategies for Data Grids Using OptorSim 总被引：1，自引：0，他引：1

D. G. Cameron A. P. Millar C. Nicholson R. Carvajal-Schiaffino K. Stockinger F. Zini 《Journal of Grid Computing》2004,2(1):57-69

Many current international scientific projects are based on large scale applications that are both computationally complex and require the management of large amounts of distributed data. Grid computing is fast emerging as the solution to the problems posed by these applications. To evaluate the impact of resource optimisation algorithms, simulation of the Grid environment can be used to achieve important performance results before any algorithms are deployed on the Grid. In this paper, we study the effects of various job scheduling and data replication strategies and compare them in a variety of Grid scenarios using several performance metrics. We use the Grid simulator , and base our simulations on a world-wide Grid testbed for data intensive high energy physics experiments. Our results show that scheduling algorithms which take into account both the file access cost of jobs and the workload of computing resources are the most effective at optimising computing and storage resources as well as improving the job throughput. The results also show that, in most cases, the economy-based replication strategies which we have developed improve the Grid performance under changing network loads. 相似文献

5.

GridBLAST: a Globus‐based high‐throughput implementation of BLAST in a Grid computing framework

Arun Krishnan 《Concurrency and Computation》2005,17(13):1607-1623

Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid‐enabled, high‐throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid‐enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a ‘mini‐Grid’ testbed and the results presented here show that for large problem sizes, a distributed, Grid‐enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

6.

Using Grid services to parallelize IBM's Generic Log Adapter

Fatos Xhafa Claudi PaniaguaLeonard Barolli Santi Caballé 《Journal of Systems and Software》2011,84(1):55-62

Since their definition in the Open Grid Services Architecture, Grid services has been used in many Grid-enabled applications to leverage the computational power offered by Grid Systems. An important research issue addressed in this regard is how to increase the efficiency of the Grid services for a massive processing and scientific computing computations arising in data intensive computations, for example the processing of large log data files arising in “problem determination” in today's IT computing environments.In this paper we present an approach that uses Grid services to efficiently parallelize the IBM's Generic Log Adapter (GLA). GLA is a generic parsing engine shipped with the IBM's Autonomic Computing Toolkit that has been conceived to convert proprietary log data into a standard log data event-based format in real time. However, in order to provide generic support for parsing the majority of today's unstructured log data formats the GLA makes heavy use of regular expressions that incur in performance limitations. Until now all the approaches that have been proposed to increase GLA's performance have revolved around fine-tuning the set of regular expressions used to configure the GLA for a particular log data format or writing specific parsing code. In this work we propose a new approach consisting in transparently parallelizing the GLA by taking advantage of its internal architecture and the fact that structuring log data is a task that lends itself very well to parallelization. We present a Master-Worker strategy that uses Grid services to parallelize GLA efficiently and in a completely transparent way for the user. 相似文献

7.

On Fairness, Optimizing Replica Selection in Data Grids

AL-Mistarihi Husni Hamad E. Yong Chan Huah 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(8):1102-1111

The emergence of scientific applications that produce huge volume of data files requires special attention, and leads to the problem of how to manage and share such data files in wide area properly. In large-scale Grid, data replication provides a suitable solution for managing data files where data reliability and data availability are enhanced. Replica selection is one of the major functions of data replication that decides which replica location is the best for Grid users. In this paper, we address the replica selection problem in a Grid environment where the users are competing for the limited data resource. Thus, our aim is to establish fairness among the users in the selection decisions. Since the criteria that play a role in the selection process conflict with each other and produce heterogeneous values, the Analytical Hierarchy Process is used to solve this optimization problem. The proposed system validity and performance are evaluated by using a simulation. The simulation results were produced and discussed in this paper. 相似文献

8.

Entropic Grid Scheduling 总被引：1，自引：0，他引：1

Youcef Derbal 《Journal of Grid Computing》2006,4(4):373-394

Computational Grids (CGs) are large scale dynamical networks of geographically distributed peer resource clusters. These clusters are independent but cooperating computing systems bound by a management framework for the provision of computing services, called Grid Services. In its basic form, the Grid scheduling problem consists in finding at least one cluster that has the capacity to handle, within the constraints of a specified quality of service, a user service request submitted to the CG. Since CGs span distinct management domains, the scheduling process has to be decentralized. Furthermore, it has to account for the ubiquitous uncertainty on the state of the CG. In this paper, we propose a scalable distributed Entropy-based scheduling approach that utilizes a Markov chain model to capture the dynamics of the service capacity state. An entropy-based quantification of the uncertainty on the service capacity information is developed and explicitly integrated within the proposed Grid scheduling approach. The performance of the proposed scheduling strategy is validated, through simulation, against a random delegation scheme and a load balancing-based scheduling strategy with respect to throughput, exploitation and convergence speed, respectively. 相似文献

9.

仿真网格体系结构及关键技术的研究与实现

下载免费PDF全文

李妮肖振彭晓源《计算机工程》2007,33(21):262-264

网格是一个集成的资源与计算环境，是当前信息领域的研究热点。基于网格技术构建协同建模/仿真环境以及仿真网格，能在更大程度上支持各种仿真软硬件资源的共享，成为复杂系统建模与仿真的重要技术与工具。该文基于网格和Web技术构建层次化的仿真网格原型系统体系结构，对原型系统各个层次中涉及的仿真网格支撑平台技术、仿真网格仿真资源网格化技术、仿真网格服务管理技术以及仿真网格应用门户技术及其实现途径进行了研究与实现。相似文献

10.

Provide Virtual Distributed Environments for Grid computing on demand

《Advances in Engineering Software》2010,41(2):213-219

Grid users always expect to meet some challenges to employ Grid resources, such as customized computing environment and QoS support. In this paper, we propose a new methodology for Grid computing – to use virtual machines as computing resources and provide Virtual Distributed Environments (VDE) for Grid users. It is declared that employing virtual environment for Grid computing can bring various advantages, for instance, computing environment customization, QoS guarantee and easy management. A light weight Grid middleware, Grid Virtualization Engine, is developed accordingly to provide functions of building virtual environment for Grids. We also present a typical use case, on-demand build a virtual e-Science infrastructure to justify the methodology. 相似文献

11.

云数据管理索引技术研究 总被引：7，自引：3，他引：4

马友忠孟小峰《软件学报》2015,26(1):145-166

数据的爆炸式增长给传统的关系型数据库带来了巨大的挑战,使其在扩展性、容错性等方面遇到了瓶颈.而云计算技术依靠其高扩展性、高可用性、容错性等特点,成为大规模数据管理的有效方案.然而现有的云数据管理系统也存在不足之处,其只能支持基于主键的快速查询,因缺乏索引、视图等机制,所以不能提供高效的多维查询、join等操作,这限制了云计算在很多方面的应用.主要对云数据管理中的索引技术的相关工作进行了深入调研,并作了对比分析,指出了其各自的优点和不足;对在云计算环境下针对海量物联网数据的多维索引技术研究工作进行了简单介绍;最后指出了在云计算环境下针对大数据索引技术的若干挑战性问题. 相似文献

12.

Self‐regulation during e‐learning: using behavioural evidence from navigation log files

D. Jeske J. Backhaus C. Stamov Roßnagel 《Journal of Computer Assisted Learning》2014,30(3):272-284

The current paper examined the relationship between perceived characteristics of the learning environment in an e‐module in relation to test performance among a group of e‐learners. Using structural equation modelling, the relationship between these variables is further explored in terms of the proposed double mediation as outlined by Ning and Downing. These authors initially proposed that motivation and self‐regulation strategies are mediators between the perception of the learning environment and performance. In our replication and extension study, we substituted self‐reported self‐regulation with behavioural indicators of self‐regulation using navigation log files and focused on test‐taking rather than general motivation. We proposed that navigational patterns captured using log files can also help deduce self‐regulation in e‐modules and provide information in the absence of self‐reports. Path analyses provide partial support for our navigational hypotheses and the model. Implications of our results for the use of e‐module data and conclusions based on navigation are discussed. 相似文献

13.

数据网格的数据管理策略 总被引：6，自引：0，他引：6

武秀川胡亮鞠九滨《小型微型计算机系统》2004,25(1):98-102

数据网格的目标是使数据密集型的高性能计算和数据密集型的数据共享事务处理及科学研究成为可能，数据网格主要包括数据存储系统和数据管理系统两大部分．数据管理系统对所存储的数据进行管理，主要包括数据的传送和复制等操作．文章对数据管理策略进行了详细的分类评述并且讨论了目前数据管理系统中的某些局限性和进一步的工作．相似文献

14.

Integration of control system design and implementation over the internet using the Jini technology

S. H. Yang X. Chen L. Yang 《Software》2003,33(12):1151-1175

This paper describes an approach for the integration of control system software design, testing, and implementation over the Internet using the Java and Jini technologies. Process models and control systems are remotely designed and tested in a virtual laboratory (also called the virtual world), and then implemented in a physical plant (also called the real world) through an integrated environment. Although control system and process model designers and real‐site operators are geographically dispersed they work together as a team over the Internet to provide the maintenance support to all the authorized industrial processes. As a consequence, time and money can both be saved because there is no need for an expert of the control software supplier to travel to the site of the real plant and conduct on‐site implementation. A generic control system life cycle model is presented first in this paper. Then three enabling technologies including Java, Jini and WWW are briefly introduced. Taking advantage of the Java, Jini and WWW technologies, an Internet‐based general infrastructure is proposed to remotely facilitate process modelling, control system design, simulation, validation and on‐site implementation. An integrated environment is established to implement the infrastructure. A water tank with a liquid level control system is refereed as a case study to illustrate how the prototype of the integrated environment works over the Internet. Further work and the conclusions are given at the end. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献

15.

Visual Grid Workflow in Triana 总被引：1，自引：0，他引：1

Ian Taylor Matthew Shields Ian Wang Andrew Harrison 《Journal of Grid Computing》2005,3(3-4):153-169

In this paper, we describe the graphical abstractions for Grids and services that have been implemented within the Triana problem solving environment. We provide an overview of the ways in which Triana interacts with services (e.g., Web and P2P services) and then how we interact with core Grid components, such as resource managers and data management systems through the extensive use of the GridLab GAT interface. We describe in detail the GAT philosophy and implementation and then show how the various GAT primitives can be represented in an intuitive fashion within a Triana workflow. This approach, which we refer to as the Visual GAT, differs substantially from other approaches because we do not tie our implementation to any specific underlying Grid middleware technologies; rather, we base our implementation on application level requirements and model such primitives from a user’s perspective by hiding as much complexity as possible without undermining the core capabilities required. We provide a use case to demonstrate the Visual GAT implementation and show how legacy applications can seamlessly be distributed and integrated in a dynamic fashion within complex data-driven workflow scenarios. 相似文献

16.

The Emerging Governance of E‐Infrastructure

Franz Barjak Kathryn Eccles Eric T. Meyer Simon Robinson Ralph Schroeder 《Journal of Computer-Mediated Communication》2013,18(2):1-24

The paper studies the transition to ICT‐based support systems for scientific research. These systems currently attempt the transition from the project stage to the more permanent stage of an infrastructure. The transition leads to several challenges, including in the area of establishing adequate governance regimes, which not all projects master successfully. Studying a set of cases from Europe and America, we look at patterns in the size and scope of the undertakings, embeddedness in user communities, aims and responsibilities, mechanisms of coordination, forms of governance, and time horizon and funding. We find that, though configurations and landscapes are somewhat diverse, successful projects typically follow distinctive paths, either large‐scale or small‐scale, and become what we term ‘stable metaorganizations’ or ‘established communities.’ 相似文献

17.

An Efficient Design and Implementation for Grid Advanced Information Service

Minyeol?Lim Email author Eui-Nam?Huh 《The Journal of supercomputing》2005,33(1):53-63

相似文献

18.

Grid organizational memory—provision of a high-level Grid abstraction layer supported by ontology alignment

《Future Generation Computer Systems》2007,23(3):348-358

相似文献

19.

大数据流式计算：关键技术及系统实例 总被引：5，自引：0，他引：5

孙大为张广艳郑纬民《软件学报》2014,25(4):839-862

大数据计算主要有批量计算和流式计算两种形态,目前,关于大数据批量计算系统的研究和讨论相对充分,而如何构建低延迟、高吞吐且持续可靠运行的大数据流式计算系统是当前亟待解决的问题且研究成果和实践经验相对较少.总结了典型应用领域中流式大数据所呈现出的实时性、易失性、突发性、无序性、无限性等特征,给出了理想的大数据流式计算系统在系统结构、数据传输、应用接口、高可用技术等方面应该具有的关键技术特征,论述并对比了已有的大数据流式计算系统的典型实例,最后阐述了大数据流式计算系统在可伸缩性、系统容错、状态一致性、负载均衡、数据吞吐量等方面所面临的技术挑战. 相似文献

20.

Securing next-generation grids

Ramakrishnan L. 《IT Professional》2004,6(2):34-39

Grid computing poses tough security challenges. What do we have - and what do we still need - to make grids safe for tomorrow? Grid computing harnesses existing self contained systems - from personal computers to supercomputers to let users share processing cycles and data across geographical and organizational boundaries. This emerging technology can transform the computational infrastructure into an integrated, pervasive virtual environment. However, although commercial and research organizations might have collaborative or monetary reasons to share resources, they are unlikely to adopt such a distributed infrastructure until they can rely on the confidentiality of the communication, the integrity of their data and resources, and the privacy of the user information. In other words, large-scale deployment of grids will occur when users can count on their security. 相似文献