论文笔记之：Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

NIPS 2015

　　摘要：本文提出一种 generative parametric model 能够产生高质量自然图像。我们的方法利用 Laplacian pyramid framework 的框架，从粗到细的方式，利用 CNN 的级联来产生图像。在金字塔的每一层，都用一个 GAN，我们的方法可以产生更高分辨率的图像。

　引言：在计算机视觉领域，构建好的产生式模型是自然图像中比较基层的问题。但是，高分辨率的图像，仍然很难产生。我们提出一种方法，能够产生大致看起来很像的场景，分辨率为：32*32 and 64*64 。为了达到这个目的，我们探索了 natural image 的多尺度结构，构建了一系列的产生式模型，每个 GAN 抓住了金字塔特定层的 image structure。这种策略，将原始的问题转化为 : a sequence of more manageable stages. 在每一种尺寸，我们利用 GAN 的思路构建 CNN 产生式模型。样本以 coarse-to-fine fashion 进行绘画，commencing with a low-frequency residual image。第二个阶段在下一个 level 采样 the band-pass structure，在 sampled residual 的基础上。接下来的 level 继续这个过程，总是在上一个 scale 的输出上进行，直到最后一个 level。所以，drawing samples 是一个有效的，直观的前向传播的过程：将随机的向量作为输入，经过 deep convolutional networks 前向传播，然后输出一张图像。

　　Approach ：　

　　本文方法是基于 NIPS 2014 年的 GAN 做的，提出了 LAPGAN model，结合了 a conditional form of GAN model into the framework of a Laplacian pyramid.

　　1. Generative Adversarial Networks

　　该小节简单介绍下产生式对抗网络(GAN)，我们所要优化的目标就是：

　　The conditional generative adversarial network (CGAN) 是 GAN 的一种拓展。其中，两个网络 G and D 都会收到额外的信息向量 $l$ 作为输入。也可以说，训练样本 $l$ 的 class 信息，所以 loss function 变成了：

　　其中，$pl(l)$ 是类别的先验分布（the prior distribution over classes）。这个模型允许产生器的输出，通过条件变量 l 控制。在我们的方法中，这个 $l$ 将会是从另一个 CGAN model 得到的另一个图。

　　关于 CGAN 更多的信息，请参考： Conditional Generative Adversarial Nets 。

　　2. Laplacian Pyramid

　　The Laplacian Pyramid 是一个线性可逆的图像表示方法，由一系列的 band-pass images 构成，spaced an octave apart，plus a low-frequency residual。

　　假设 d(*) 是一个 down sampling operation，将 j * j 的 image I ，划分为 j/2 * j/2 。对应的，u(*) 是一个 upsampling operation，使得图像变成：2j * 2j。

　　我们首先构建一个图像金字塔，$ g(I) = [I_0, I_1, ... , I_K] $，其中，I0 = I and Ik is k repeated operated applications of d(*) to I 。K 表示金字塔的层数。

　　图像金字塔的每一个 level k 的系数 $h_k$ 是通过采取两个近邻 level 的不同来构建的，upsampling the smaller one with u(*) so that the sizes are compatible :

　　直观地来说，每一 level 抓住了特定尺寸的图像结构。Laplacian pyramid 的最后一层 $h_K$ is not a difference image, 而是一个低频的 residual ，equal to the final Gaussian pyramid level ，即：$h_K = I_K$ 。从拉普拉斯金字塔系数 $[h_1, ... , h_K]$ 重建，是利用 backward recurrence 执行的：

　　其中，重建是从 coarse level 开始的，重复的进行 upsample，在下一个更好的 level 添加不同的image，直到我们得到原始分辨率的图像。

　　3. Laplacian Generative Adversarial Networks (LAPGAN)

　　本文所提出的方法，就是将两个模型进行结合。

　　首先考虑 the sampling procedure，我们有一系列的产生式模型 ${G_0, ... , G_K}$，每个产生式模型构建了金字塔不同层次的图像的系数 $h_k$ 的分布。Sampling an image 类似于 Eq. (4) 的重建过程，除了产生式模型是用于产生 $h_k$ :

　　图 1 展示了 3层金字塔，用 4 个产生式模型构建 64*64 image 的过程：

　　产生式模型 ${G_0, ... , G_K}$ 在图像金字塔的每一层都用 CGAN 的方法进行训练。特别的，我们对每一个训练图像 I，构建一个 Laplacian Pyramid。在每一层，我们随机挑选：

　　(i) 从 Eq. (3) 采用标准的步骤，构建 the coefficients $h_k$ ，或者 (ii) 用 $G_k$ 产生他们：

　　注意到 $G_k$ 是一个 convent，采用一种 coarse scale version of the image $l_k = u(I_{k+1})$ 作为输入，以及 noise vector $z_k$ 。Dk 就用于判断当前图像是产生的，还是原始图像。图像金字塔的 final scale，the low frequency residual 已经足够小了，可以直接用标准的 GAN 进行建模，$D_K$仅仅用 $h_K$ and $h^~_K$ 作为输入。这个框架见图 2 。

　　将产生的过程分解为一系列的过程，是本文的一个重要的创新点。

　　Model Architecture & Training

　　我们将该方法应用到三个数据集上进行了测试：(1) Cifar-10 　　(2) STL10　　(3) LSUN 。

　　作者的开源代码： http://soumith.ch/eyescream/

　　Experiments and Discuss ：

时间： 2024-11-10 13:36:15

论文笔记之：Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

论文笔记之：Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks的相关文章

论文笔记之：Generative Adversarial Nets

论文笔记之：Optical Flow Estimation using a Spatial Pyramid Network

论文笔记之：UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

论文笔记之：Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning

论文笔记： Dual Deep Network for Visual Tracking

Video Frame Synthesis using Deep Voxel Flow 论文笔记

Face Aging with Conditional Generative Adversarial Network 论文笔记

论文笔记之：Semi-Supervised Learning with Generative Adversarial Networks

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks 论文笔记