首页 | 官方网站   微博 | 高级检索  
     


Optimization of memory access for the convolutional neural network training
Authors:WANG Jijun  HAO Ziyu  LI Hongliang
Affiliation:Jiangnan Institute of Computing Technology, Wuxi 214083, China
Abstract:Batch Normalization (BN) can effectively speed up deep neural network training, while its complex data dependence leads to the serious "memory wall" bottleneck. Aiming at the "memory wall" bottleneck for the training of the convolutional neural network(CNN) with BN layers, an effective memory access optimization method is proposed through BN reconstruction and fused-layers computation. First, through detailed analysis of BN’s data dependence and memory access features during training, some key factors for large amounts of memory access are identified. Second, the “Convolution + BN + ReLU (Rectified Linear Unit)” block is fused as a computational block to reduce memory access with re-computing strategy in training. Besides, the BN layer is split into two sub-layers which are respectively fused with its adjacent layers, and this approach further reduces memory access during training and effectively improves the accelerator’s computational efficiency. Experimental results show that the amount of memory access is decreased by 33%, 22% and 31% respectively, and the actual computing efficiency of the V100 is improved by 20.5%, 18.5% and 18.1% respectively when the ResNet-50, Inception V3 and DenseNet are trained on the NVIDIA TELSA V100 GPU with the optimization method. The proposed method exploits the characteristics of memory access during training, and can be used in conjunction with other optimization methods to further reduce the amount of memory access during training.
Keywords:deep convolutional neural networks  model training  fused-layers  batch normalization reconstruction  off-chip memory access optimization    
点击此处可从《西安电子科技大学学报》浏览原始摘要信息
点击此处可从《西安电子科技大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号