首页 | 官方网站   微博 | 高级检索  
     

基于MapReduce计算模型的气象资料处理调优试验
引用本文:杨润芝,沈文海,肖卫青,胡开喜,杨昕,王颖,田伟.基于MapReduce计算模型的气象资料处理调优试验[J].应用气象学报,2014,25(5):618-628.
作者姓名:杨润芝  沈文海  肖卫青  胡开喜  杨昕  王颖  田伟
作者单位:1.国家气象信息中心,北京 100081
摘    要:云计算技术使用分布式的计算技术实现了并行计算的计算能力和计算效率,解决了单机服务器计算能力低的问题。基于长序列历史资料所计算得出的气候标准值对于气象领域实时业务、准实时业务及科学研究中均具有重要的意义。由于长序列历史资料数据量大、运算逻辑较复杂,在传统单节点计算平台上进行整编计算耗时非常长。该文基于Hadoop分布式计算框架搭建了集群模式的云计算平台,以长序列历史资料作为源数据,基于MapReduce计算模型实现了部分整编算法,提高计算时效。同时,由于数据源本身具有文件个数多、单个文件小等特点,对数据源存储形式及数据文件大小进行改造,分别利用SequenceFile方式及文本文件合并方式对同一种场景进行计算时效对比测试,分别测试了10个文件合并、100个文件合并两种情况,使时效性得到了更大程度的提升。

关 键 词:MapReduce    云计算    Hadoop    历史资料整编
收稿时间:2013-10-08
修稿时间:6/3/2014 12:00:00 AM

A Set of MapReduce Tuning Experiments Based on Meteorological Operations
Yang Runzhi,Shen Wenhai,Xiao Weiqing,Hu Kaixi,Yang Xin,Wang Ying and Tian Wei.A Set of MapReduce Tuning Experiments Based on Meteorological Operations[J].Quarterly Journal of Applied Meteorology,2014,25(5):618-628.
Authors:Yang Runzhi  Shen Wenhai  Xiao Weiqing  Hu Kaixi  Yang Xin  Wang Ying and Tian Wei
Affiliation:1.National Meteorological Information Center, Beijing 1000812.Nanjing University of Information Science & Technology, Nanjing 210044
Abstract:Cloud computing technologies, which solves the problem of low computing power of a standalone server, uses distributed computing technology to achieve the computing power of parallel computing and computational efficiency. Cloud computing is a new application model for decentralized computing which can provide reliable, customized and maximum number of users with minimum resource, and it is also an important way to carry out cloud computing theory research and practical application combining with other theory and good techniques. In many industries and fields, cloud computing has a wider range of applications, and its flexibility, ease of use, stability is gradually affirmed. In meteorological department, cloud-based platform for the development of scientific computing is still very limited, but some attempts are implemented with the maturation of cloud computing.In meteorological operations, such as large-scale scientific computing and other general computing model are run on high-performance server clusters. Due to limitations of resources and the number of HPC nodes, scientific computing still relies on traditional standalone or clustered mode. Therefore, an internal exploration and conventional general-purpose computing and cloud computing platform is very meaningful for the meteorological department. 60-year valuable and precious long sequence of historical data are stored in National Meteorological Information Center for the use of real-time, near-real-time business and research. Processing these historical data is time-consuming, therefore some new methods are implemented. Based on Hadoop cloud computing platform, a cluster mode is built and a variety of statistical methods are adopted using MapReduce computation model. The storage format of the source data is adjusted with SequenceFile which is composed of < Key, Value > serialization, by this mean multiple files of Format-A are merged to a large SequenceFile to test computational efficiency changes. Meanwhile, many small files are merged to a larger file. Configurations are modified experimentally for the Hadoop cluster environment, and different number of task nodes are used to record different computational efficiency.
Keywords:MapReduce  cloud computing  Hadoop  meteorological data processing
本文献已被 CNKI 等数据库收录!
点击此处可从《应用气象学报》浏览原始摘要信息
点击此处可从《应用气象学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号