首页 | 官方网站   微博 | 高级检索  
     

一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型
引用本文:王霞,徐慧英,朱信忠.一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型[J].计算机工程与科学,2022,44(6):1083-1089.
作者姓名:王霞  徐慧英  朱信忠
作者单位:(浙江师范大学数学与计算机科学学院,浙江 金华 321004)
基金项目:国家自然科学基金(61976196);浙江省万人计划“杰出人才”项目(2018R51001);浙江省自然科学基金(LZ22F030003)
摘    要:文本生成图像是机器学习领域非常具有挑战性的任务,虽然目前已经有了很大突破,但仍然存在模型训练不稳定以及梯度消失等问题。针对这些不足,在堆叠生成对抗网络(StackGAN)基础上,提出一种结合谱归一化与感知损失函数的文本生成图像模型。首先,该模型将谱归一化运用到判别器网络中,将每层网络梯度限制在固定范围内,相对减缓判别器网络的收敛速度,从而提高网络训练的稳定性;其次,将感知损失函数添加到生成器网络中,增强文本语义与图像内容的一致性。使用Inception score评估所提模型生成图像的质量。实验结果表明,该模型与原始StackGAN相比,具有更好的稳定性且生成图像更加逼真。

关 键 词:深度学习  生成对抗网络  文本生成图像  谱归一化  感知损失函数  
收稿时间:2021-06-07
修稿时间:2021-07-13

A text-to-image model based on thetwo-phase stacked generative confrontationnetwork with spectral normalization
WANG Xia,XU Hui-ying,ZHU Xin-zhong.A text-to-image model based on thetwo-phase stacked generative confrontationnetwork with spectral normalization[J].Computer Engineering & Science,2022,44(6):1083-1089.
Authors:WANG Xia  XU Hui-ying  ZHU Xin-zhong
Affiliation:(College of Mathematics and Computer Science,Zhejiang Normal University,Jinhua 321004,China)
Abstract:Generating images from text is a challenge task in machine learning community. Although significant success has been achieved so far, problems such as unstable network training and disappear- ing gradients still exist. In response to the above shortcomings, based on the stacked generative confrontation network model (StackGAN), this paper proposes a text-to-image generation method that combines spectral normalization and perceptual loss function. Firstly, the network model applies spectral normalization to the discriminator, restricts the gradient of each layer of the network to a fixed range, slows down the convergence speed of the discriminator, and hence improves the stability of network training. Secondly, the perceptual loss function is added to the generator network to enhance the consistency between the text content and the generated image. The network model uses Inception scores to evaluate the quality of the generated images. The experimental results show that, compared with the original StackGAN, the network model has better stability and generates clearer images.
Keywords:deep learning  generative adversarial network  text-to-image generation  spectral normalization  perceptual loss function  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号