首页 | 官方网站   微博 | 高级检索  
     

基于Spark的分布式大数据分析算法研究
引用本文:宋泊东,张立臣,江其洲.基于Spark的分布式大数据分析算法研究[J].计算机应用与软件,2019,36(1):39-44.
作者姓名:宋泊东  张立臣  江其洲
作者单位:广东工业大学计算机学院 广东广州510006;广东工业大学计算机学院 广东广州510006;广东工业大学计算机学院 广东广州510006
摘    要:随着大数据时代的到来,数据计算的实时性和数据量面临许多挑战。为了满足庞大的数据量和大数据高速处理的要求,研究将Apache作为一种集成的资源管理系统。采用Apache Storm、Apache Spice及SARK RDD处理大型分布式实时数据流,使用Apache Kafka作为消息中间件来支持异步消息的通信。设计一种支持并行运算规则的分布式大数据分析处理算法。实验结果表明:该算法可有效降低海量数据的分析速度,且支持系统内各子系统间的异构信息沟通与数据存储,足以满足高频交易市场的短期趋势预测需求。在高频、大数据处理系统中具有较高的应用价值。

关 键 词:APACHE  Kafka  分布式  SPARK  RDD  N层  实时数据流

DISTRIBUTED BIG DATA ANALYSIS ALGORITHM BASED ON SPARK
Song Bodong,Zhang Lichen,Jiang Qizhou.DISTRIBUTED BIG DATA ANALYSIS ALGORITHM BASED ON SPARK[J].Computer Applications and Software,2019,36(1):39-44.
Authors:Song Bodong  Zhang Lichen  Jiang Qizhou
Affiliation:(School of Computers,Guangdong University of Technology,Guangzhou 510006,Guangdong,China)
Abstract:With the coming of the big data era,the real-time and data quantity of data computation is facing with many challenges.To meet the requirements of large data volume and high-speed processing of big data,we took Apache as an integrated resource management system.We adopted Apache Storm,Apache Spice and SARK RDD to deal with large-scale distributed real-time data streams,and used Apache Kafka as message middleware to support communication of asynchronous message.A distributed big data analysis and processing algorithm was designed,which supported parallel operation rules.Experimental results show that the algorithm can effectively reduce the analysis speed of massive data and support heterogeneous information communication and data storage among subsystems.It is sufficient to meet the demands of short-term trend forecast in high-frequency trading market.It has high application value in high frequency and big data processing system.
Keywords:Apache Kafka  Distributed  Spark RDD  n layer  Real-time data stream
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号