首页 | 官方网站   微博 | 高级检索  
     

支持低延迟通信与容错的计算资源共享环境构建
引用本文:许爱军,张岳.支持低延迟通信与容错的计算资源共享环境构建[J].计算机工程与设计,2012,33(4):1352-1356.
作者姓名:许爱军  张岳
作者单位:1. 广州铁路职业技术学院信息工程系,广东广州,510430
2. 中国科学院数学与系统科学研究院,北京,100120
摘    要:提出与描述了支持低延迟通信与容错的计算资源共享环境LF-CRSE (low latency and fault tolerance CRSE),LF-CRSE提出了节点功能角色的观点,由客户端功能节点、任务服务器、工作机服务提供器、工作机节点组成,形成一个可扩展的分布式网络体系结构.采用了任务缓存、任务预获取和任务服务器端计算等策略保证了通信过程的低延迟开销.在应用上利用分支界限模式的任务划分,使LF-CRSE支持主-从模式和分-治模式的灵活编程模型.通过工作机端的心跳消息和面向子任务的容错方式保证了LF-CRSE的正确性.测试过程选择了具有数据依赖的分布式旅行商问题,实验结果表明,LF-CRSE的加速比随着工作机的增加稳定提高,在低延迟通信和容错特性上也具有良好的性能.

关 键 词:分布式计算  计算资源共享  低延迟  容错  分支界限

Building computing resource sharing environment with low latency and fault tolerance
XU Ai-jun , ZHANG Yue.Building computing resource sharing environment with low latency and fault tolerance[J].Computer Engineering and Design,2012,33(4):1352-1356.
Authors:XU Ai-jun  ZHANG Yue
Affiliation:1.Department of Information Engineering,Guangzhou Institute of Railway Technology,Guangzhou 510430,China; 2.Academy of Mathematics and System Science,Chinese Academy of Sciences,Beijing 100120,China)
Abstract:A computing resource sharing environment with low latency and fault tolerance called LF-CRSE is presented and described.All the nodes in LF-CRSE are designed as a certain role,named client,task sever,worker service provider,worker and thus form a scalable network topology for LF-CRSE.For a parallel application,LF-CRSE can hide communication latency via task cache,task pre-fetching and task server computation policy.These features also enable an elegant expression of branch-and-bound optimization,which is used for the divide-and-conquer computations.LF-CRSE manages a worker processor set which can change during the program execution for reasons that include faulty workers.LF-CRSE is deployed as an experimental platform,with which we have achieved a computation record by solving the TSP(travelling salesman problem).The results obtained from performance analysis show that the speedup of LF-CRSE is increased.Some good performances are also obtained in the low latency and fault tolerance testing.
Keywords:distributed computing  computing resource sharing  low latency  fault tolerance  branch and bound
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号