Optimization of memory access for the convolutional neural network training期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Optimization of memory access for the convolutional neural network training

Authors:	WANG Jijun HAO Ziyu LI Hongliang

Affiliation:	Jiangnan Institute of Computing Technology, Wuxi 214083, China

Abstract:	Batch Normalization (BN) can effectively speed up deep neural network training, while its complex data dependence leads to the serious "memory wall" bottleneck. Aiming at the "memory wall" bottleneck for the training of the convolutional neural network(CNN) with BN layers, an effective memory access optimization method is proposed through BN reconstruction and fused-layers computation. First, through detailed analysis of BN’s data dependence and memory access features during training, some key factors for large amounts of memory access are identified. Second, the “Convolution + BN + ReLU (Rectified Linear Unit)” block is fused as a computational block to reduce memory access with re-computing strategy in training. Besides, the BN layer is split into two sub-layers which are respectively fused with its adjacent layers, and this approach further reduces memory access during training and effectively improves the accelerator’s computational efficiency. Experimental results show that the amount of memory access is decreased by 33%, 22% and 31% respectively, and the actual computing efficiency of the V100 is improved by 20.5%, 18.5% and 18.1% respectively when the ResNet-50, Inception V3 and DenseNet are trained on the NVIDIA TELSA V100 GPU with the optimization method. The proposed method exploits the characteristics of memory access during training, and can be used in conjunction with other optimization methods to further reduce the amount of memory access during training.

Keywords:	deep convolutional neural networks model training fused-layers batch normalization reconstruction off-chip memory access optimization

	点击此处可从《西安电子科技大学学报》浏览原始摘要信息
	点击此处可从《西安电子科技大学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏