基于Octave卷积的混合精度神经网络量化方法 Mixed precision neural network quantization method based on Octave convolution期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Octave卷积的混合精度神经网络量化方法

引用本文：	张文烨,尚方信,郭浩.基于Octave卷积的混合精度神经网络量化方法[J].计算机应用,2021,41(5):1299-1304.

作者姓名：	张文烨尚方信郭浩

作者单位：	1. 中国人民大学信息学院, 北京 100872;2. 山西虚拟现实产业技术研究院有限公司, 太原 030024;3. 太原理工大学信息与计算机学院, 太原 030024

基金项目：	国家自然科学基金资助项目（61672374）。

摘要：	浮点数位宽的深度神经网络需要大量的运算资源，这导致大型深度神经网络难以在低算力场景（如边缘计算）上部署。为解决这一问题，提出一种即插即用的神经网络量化方法，以压缩大型神经网络的运算成本，并保持模型性能指标不显著下降。首先，基于Octave卷积将输入特征图的高频和低频成分进行分离；其次，分别对高低频分量应用不同位宽的卷积核进行卷积运算；第三，使用不同位宽的激活函数将高低频卷积结果量化至相应位宽；最后，混合不同精度的特征图来获得该层卷积结果。实验结果证实了所提方法压缩模型的有效性，在CIFAR-10/100数据集上，将模型压缩至1+8位宽时，该方法可保持准确率指标的下降小于3个百分点；在ImageNet数据集上，使用该方法将ResNet50模型压缩至1+4位宽时，其正确率指标仍高于70%。
关键词：	深度神经网络模型量化模型压缩卷积神经网络深度学习
收稿时间：	2020-07-27
修稿时间：	2020-09-18
Mixed precision neural network quantization method based on Octave convolution

ZHANG Wenye,SHANG Fangxin,GUO Hao.Mixed precision neural network quantization method based on Octave convolution[J].journal of Computer Applications,2021,41(5):1299-1304.

Authors:	ZHANG Wenye SHANG Fangxin GUO Hao

Affiliation:	1. School of Information, Renmin University of China, Beijing 100872, China;2. Shanxi Extended Reality Industrial Technology Research Institute Company Limited, Taiyuan Shanxi 030024, China;3. College of Information and Computer, Taiyuan University of Technology, Taiyuan Shanxi 030024, China

Abstract:	Deep neural networks with 32-bit weights require a lot of computing resources, making it difficult for large-scale deep neural networks to be deployed in limited computing power scenarios (such as edge computing). In order to solve this problem, a plug-and-play neural network quantification method was proposed to reduce the computational cost of large-scale neural networks and keep the model performance away from significant reduction. Firstly, the high-frequency and low-frequency components of the input feature map were separated based on Octave convolution. Secondly, the convolution kernels with different bits were respectively applied to the high- and low-frequency components for convolution operation. Thirdly, the high- and low-frequency convolution results were quantized to the corresponding bits by using different activation functions. Finally, the feature maps with different precisions were mixed to obtain the output of the layer. Experimental results verify the effectiveness of the proposed method on model compression. When the model was compressed to 1+8 bit(s), the proposed method had the accuracy dropped less than 3 percentage points on CIFAR-10/100 dataset; moreover, the proposed method made the ResNet50 structure based model compressed to 1+4 bit(s) with the accuracy higher than 70% on ImageNet dataset.

Keywords:	deep neural network model quantification model compression convolutional neural network deep learning
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏