首页 | 官方网站   微博 | 高级检索  
     

采用轻量级网络MobileNetV2的酿酒葡萄检测模型
引用本文:李国进,黄晓洁,李修华,艾矫燕.采用轻量级网络MobileNetV2的酿酒葡萄检测模型[J].农业工程学报,2021,37(17):168-176.
作者姓名:李国进  黄晓洁  李修华  艾矫燕
作者单位:广西大学电气工程学院,南宁 530004
基金项目:国家自然科学基金项目(31760342);广西创新驱动发展专项(桂科AA17202032-2)
摘    要:为提高高分辨率田间葡萄图像中小目标葡萄检测的速度和精度,该研究提出了一种基于轻量级网络的酿酒葡萄检测模型(Wine Grape Detection Model,WGDM)。首先,采用轻量级网络MobileNetV2取代YOLOv3算法的骨干网络DarkNet53完成特征提取,加快目标检测的速度;其次,在多尺度检测模块中引入M-Res2Net模块,提高检测精度;最后,采用平衡损失函数和交并比损失函数作为改进的定位损失函数,增大目标定位的准确性。试验结果表明,提出的WGDM模型在公开的酿酒葡萄图像数据集的测试集上平均精度为81.2%,网络结构大小为44 MB,平均每幅图像的检测时间为6.29 ms;与单发检测器(Single Shot Detector,SSD)、YOLOv3、YOLOv4和快速区域卷积神经网络(Faster Regions with Convolutional Neural Network,Faster R-CNN)4种主流检测模型相比,平均精度分别提高8.15%、1.10%、3.33%和6.52%,网络结构分别减小了50、191、191和83 MB,平均检测时间分别减少了4.91、7.75、14.84和158.20 ms。因此,该研究提出的WGDM模型对田间葡萄果实具有更快速、更准确的识别与定位,为实现葡萄采摘机器人的高效视觉检测提供了可行方法。

关 键 词:机器视觉  图像处理  模型  葡萄  检测  YOLO  Res2Net
收稿时间:2021/1/15 0:00:00
修稿时间:2021/3/19 0:00:00

Detection model for wine grapes using MobileNetV2 lightweight network
Li Guojin,Huang Xiaojie,Li Xiuhu,Ai Jiaoyan.Detection model for wine grapes using MobileNetV2 lightweight network[J].Transactions of the Chinese Society of Agricultural Engineering,2021,37(17):168-176.
Authors:Li Guojin  Huang Xiaojie  Li Xiuhu  Ai Jiaoyan
Affiliation:School of Electrical Engineering, Guangxi University, Nanning 530004, China
Abstract:Efficient detection of grape image has widely been one of the most important technologies in automatic grape harvesting robots. In this study, a wine grape detection model (WGDM) was proposed to improve the speed and accuracy of field grape detection using a lightweight network. Firstly, the MobileNetV2 lightweight network was adopted to significantly increase the detection speed for real-time objects in the WGDM model, due to the smaller size, faster speed, and higher accuracy in the image recognition, compared with DarkNet53 in the original YOLOv3. Secondly, the M-Res2Net module was added to the multi-scale detection of YOLOv3, as some standard convolutional layers with 1?1 and 3?3 convolution kernels were removed, particularly for the better capability of multi-scale feature extraction and higher accuracy of detection in the improved model. Finally, a new location loss function was established using the balanced loss and the intersection over union loss. The classification and object loss stayed the same as the YOLO. As such, a more balance was achieved in the object, classification and location during the model training, thereby to enlarge the precision of object location. Different detection models were trained, including the proposed WGDM, Single Shot Detector (SSD), the original YOLOv3, YOLOv4, and Faster Regions with Convolutional Neural Network (Faster R-CNN). The available wine grape instance segmentation dataset (WGISD) was also selected, including 300 images of wine grape and 300 annotation files with 4 432 objects under the same experimental conditions. Additionally, the resolution of input image was adjusted from the original resolution of 2 048?1 365 pixels or 2 048?1 536 pixels to 608?608 pixels. The experimental results showed that the proposed WGDM model in the test set of wine grape image dataset achieved an average accuracy of 81.20%, which was 8.15% higher than that of SSD, 1.10% higher than that of the original YOLOv3, 3.33% higher than that of YOLOv4, and 6.52% higher than that of Faster R-CNN. The F1-score (a metric function that balances the precision and recall of the model) of the proposed model reached 0.856 3, which was 0.056 3 higher than that of SSD, 0.005 4 higher than that of the original YOLOv3, 0.041 7 higher than that of YOLOv4, and 0.012 5 higher than that of Faster R-CNN. The network structure size of the proposed model was 44 MB, which was 50 MB smaller than that of SSD, more than 4 times smaller than that of the original YOLOv3 or YOLOv4, and 83 MB less than that of Faster R-CNN. The average detection time for each grape image in the proposed model was 6.29 ms, which was 4.91 ms shorter than that of SSD, no less than 1.2 times shorter than that of the original YOLOv3, nearly 2.4 times shorter than that of YOLOv4, and over 25 times shorter than that of Faster R-CNN. Moreover, the number of floating-point operations (the sum of the number of multiplication operations and the number of addition operations) of the proposed model was only 10.14 ?109, which was over 7.6 less than that of SSD, almost 5.9 times less than that of the original YOLOv3, more than 5.2 times less than that of YOLOv4, and at least 5.5 times less than that of Faster R-CNN. Therefore, the proposed WGDM model presented the faster and more accurate recognition and location of grape fruits in the field, providing a feasible path for the efficient visual detection of grape picking robots.
Keywords:machine vision  image processing  models  grape  detection  YOLO  Res2Net
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号