首页 | 官方网站   微博 | 高级检索  
     

基于细粒度特征融合的部分多模态哈希
引用本文:殷崭祚,李博涵,王萌,黄瑞龙,吴文隆,王昊奋. 基于细粒度特征融合的部分多模态哈希[J]. 软件学报, 2024, 35(3): 1074-1089
作者姓名:殷崭祚  李博涵  王萌  黄瑞龙  吴文隆  王昊奋
作者单位:南京航空航天大学 计算机科学与技术学院, 江苏 南京 211106;南京航空航天大学 计算机科学与技术学院, 江苏 南京 211106;软件新技术与产业化协同创新中心, 江苏 南京 211106;空天地海一体化大数据应用技术国家工程实验室, 陕西 西安 710119;同济大学 设计创意学院, 上海 200092
基金项目:国家重点研发计划项目(2020YFB1708100);“十四五”民用航天技术预先研究项目(D020101);国家自然科学基金 (62172351);高安全系统的软件开发与验证技术工业和信息化部重点实验室(NJ2018014);河北省软件工程重点实验室项目(22567637H);南京航空航天大学前瞻布局科研专项资金。
摘    要:多模态数据的指数级增长使得传统数据库在存储和检索方面遇到挑战,而多模态哈希通过融合多模态特征并映射成二进制哈希码,能够有效地降低数据库的存储开销并提高其检索效率.虽然目前已经有许多针对多模态哈希的工作取得了较好的效果,但是仍然存在着3个重要问题:(1)已有方法偏向于考虑所有样本都是模态完整的,然而在实际检索场景中,样本缺失部分模态的情况依然存在;(2)大多数方法都是基于浅层学习模型,这不可避免地限制了模型的学习能力,从而影响最终的检索效果;(3)针对模型学习能力弱的问题已提出了基于深度学习框架的方法,但是它们在提取各个模态的特征后直接采用了向量拼接等粗粒度特征融合方法,未能有效地捕获深层语义信息,从而弱化了哈希码的表示能力并影响最终的检索效果.针对以上问题,提出了PMH-F3模型.该模型针对样本缺失部分模态的情况,实现了部分多模态哈希.同时,基于深层网络架构,利用Transformer编码器,以自注意力方式捕获深层语义信息,并实现细粒度的多模态特征融合.基于MIRFlickr和MSCOCO数据集进行了充分实验并取得了最优的检索效果.实验结果表明:所提出的PMH-F3

关 键 词:部分多模态哈希  多模态数据检索  细粒度特征融合
收稿时间:2023-07-17
修稿时间:2023-09-05

Partial Multimodal Hashing based on Fine-grained Feature Fusion
YIN Zhan-Zuo,LI Bo-Han,WANG Meng,HUANG Rui-Long,WU Wen-Long,WANG Hao-Feng. Partial Multimodal Hashing based on Fine-grained Feature Fusion[J]. Journal of Software, 2024, 35(3): 1074-1089
Authors:YIN Zhan-Zuo  LI Bo-Han  WANG Meng  HUANG Rui-Long  WU Wen-Long  WANG Hao-Feng
Affiliation:College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 211106, China;National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Shanxi 710119, China;College of Design and Innovation, Tongji University, Shanghai 200092, China
Abstract:Due to the exponential growth of multimodal data, traditional databases are confronted with challenges in terms of storage and retrieval. Multimodal hashing is able to effectively reduces the storage cost of databases and improves retrieval efficiency by fusing multimodal features and mapping them into binary hash codes. Although many works on multimodal hashing perform well, there are also three important problems to be solved: (1) Existing methods tend to consider that all samples are modality-complete. However, in practical retrieval scenarios, it is also common for samples to miss partial modalities; (2) Most methods are based on shallow learning models, which inevitably limits models’ learning ability and affects the final retrieval performance; (3) Some methods that based on deep learning framework have been proposed to address the issue of weak learning ability, but they directly use coarse-grained feature fusion methods, such as concatenation, after extracting features from different modalities, which fails to effectively capture deep semantic information, thereby weakening the representation ability of hash codes and affecting the final retrieval performance. In response to the above problems, we propose the PMH-F3 model. This model implements partial multimodal hashing for the case of samples missing partial modalities. The model is based on deep network architecture, and the Transformer encoder is used to capture deep semantics in self-attention manner, achieving fine-grained multimodal feature fusion. We conduct sufficient experiments on MIR Flickr and MS COCO datasets and achieve the best retrieval performance. The results of experiments show that PMH-F3model can effectively implement partial multimodal hashing and can be applied to large-scale multimodal data retrieval.
Keywords:Partial multimodal hashing  multimodal data retrieval  fine-grained feature fusion
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号