一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型 A text-to-image model based on thetwo-phase stacked generative confrontationnetwork with spectral normalization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型

引用本文：	王霞,徐慧英,朱信忠.一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型[J].计算机工程与科学,2022,44(6):1083-1089.

作者姓名：	王霞徐慧英朱信忠

作者单位：	(浙江师范大学数学与计算机科学学院,浙江金华 321004)

基金项目：	国家自然科学基金（61976196）；浙江省万人计划“杰出人才”项目（2018R51001）;浙江省自然科学基金（LZ22F030003）

摘要：	文本生成图像是机器学习领域非常具有挑战性的任务，虽然目前已经有了很大突破，但仍然存在模型训练不稳定以及梯度消失等问题。针对这些不足，在堆叠生成对抗网络（StackGAN）基础上，提出一种结合谱归一化与感知损失函数的文本生成图像模型。首先，该模型将谱归一化运用到判别器网络中，将每层网络梯度限制在固定范围内，相对减缓判别器网络的收敛速度，从而提高网络训练的稳定性；其次，将感知损失函数添加到生成器网络中，增强文本语义与图像内容的一致性。使用Inception score评估所提模型生成图像的质量。实验结果表明，该模型与原始StackGAN相比，具有更好的稳定性且生成图像更加逼真。
关键词：	深度学习生成对抗网络文本生成图像谱归一化感知损失函数
收稿时间：	2021-06-07
修稿时间：	2021-07-13
A text-to-image model based on thetwo-phase stacked generative confrontationnetwork with spectral normalization

WANG Xia,XU Hui-ying,ZHU Xin-zhong.A text-to-image model based on thetwo-phase stacked generative confrontationnetwork with spectral normalization[J].Computer Engineering & Science,2022,44(6):1083-1089.

Authors:	WANG Xia XU Hui-ying ZHU Xin-zhong

Affiliation:	（College of Mathematics and Computer Science,Zhejiang Normal University,Jinhua 321004,China）

Abstract:	Generating images from text is a challenge task in machine learning community. Although significant success has been achieved so far, problems such as unstable network training and disappear- ing gradients still exist. In response to the above shortcomings, based on the stacked generative confrontation network model (StackGAN), this paper proposes a text-to-image generation method that combines spectral normalization and perceptual loss function. Firstly, the network model applies spectral normalization to the discriminator, restricts the gradient of each layer of the network to a fixed range, slows down the convergence speed of the discriminator, and hence improves the stability of network training. Secondly, the perceptual loss function is added to the generator network to enhance the consistency between the text content and the generated image. The network model uses Inception scores to evaluate the quality of the generated images. The experimental results show that, compared with the original StackGAN, the network model has better stability and generates clearer images.

Keywords:	deep learning generative adversarial network text-to-image generation spectral normalization perceptual loss function

	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏