首页 | 官方网站   微博 | 高级检索  
     

基于Spark的近地表速度模型快速层析反演
引用本文:陈金焕.基于Spark的近地表速度模型快速层析反演[J].石油物探,2022(1):146-155.
作者姓名:陈金焕
作者单位:中国石油化工股份有限公司石油物探技术研究院
摘    要:近地表速度模型层析反演多采用基于初至旅行时射线追踪的迭代反演方法。通常采用基于共享存储的MPI并行方式提高计算效率,但当计算节点增至一定规模时会存在网络I/O压力过大的计算瓶颈。为此,提出了一种快速、稳健的基于Spark技术的近地表速度模型层析反演方法,采用分布式内存管理技术将迭代中重复计算的数据持久化至内存中,提高程序运行效率。同时,为了解决共享存储中随着节点规模扩大而产生网络I/O堵塞的瓶颈问题,在分布式存储环境下组织弹性分布式数据集(RDD),设计基本规约单位为深度方向的一维反演数据,基于Spark Shuffle在规约过程中分布并行规约,利用Spark调度器在各个进程中分配任务,实现并行计算。实际数据计算结果表明:在反演结果精度不变的情况下,相对于常规MPI并行技术,该实现方法能够大幅度降低迭代过程中产生的网络I/O;当计算节点较多时,计算效率能够提高4倍以上;并行加速比呈现类线性增长趋势。

关 键 词:近地表层析反演  迭代计算  Spark并行  弹性分布式数据集  规约基本单元

Fast tomographic inversion of the near-surface velocity model based on the Spark technology
CHEN Jinhuan.Fast tomographic inversion of the near-surface velocity model based on the Spark technology[J].Geophysical Prospecting For Petroleum,2022(1):146-155.
Authors:CHEN Jinhuan
Affiliation:(Sinopec Geophysical Research Institute,Nanjing 211103,China)
Abstract:Tomographic inversion of the near-surface velocity model is generally realized using the ray tracing method,which is based on the first-arrival travel time.Generally,this method requires multiple iterations to gradually calculate the real near-surface velocity.Traditionally,scholars have relied on the MPI parallel method and shared storage to improve the computational efficiency of the method.However,when the number of computing nodes increases to a certain value,a computing bottleneck will occur,resulting in excessive network I/O pressure.Therefore,in this study,a fast and robust tomographic inversion of the near-surface velocity model based on the Spark technology was implemented.In this method,a distributed memory management technology was adopted to maintain common intermediate data and improve the efficiency of the program operation.The JNI programming framework,which can call the complex and robust C++library functions,was used.To solve the computing bottleneck of the network I/O,a basic merge unit was designed based on the Spark Shuffle.The basic merge unit contains one-dimensional inversion data in the depth direction.According to the tomographic inversion algorithm,resilient distributed datasets(RDDs)are organized under distributed storage.In particular,the inversion data are parallelly merged by one-dimensional data during the reduction process,which is an innovative point.Then,based on the directed acyclic graph(DAG),the Spark scheduler distributes tasks in each process to achieve parallel computing.Results of an application using real data showed that the method is correct and can greatly reduce the network I/O generated in the iterative process.When there are many cluster nodes,the computing efficiency increases by more than four times that of the traditional parallel MPI.It is especially important that the parallel acceleration ratio exhibits a linear growth trend.
Keywords:near-surface tomographic inversion  iterative calculation  Spark parallel  resilient distributed datasets  basic merge unit
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号