Adversarial Generation of Continuous Images

CVPR 2021

Ivan SkorokhodovSavva IgnatyevMohamed Elhoseiny

Abstract

In most existing learning systems, images are typically viewed as 2D pixel arrays. However, in another paradigm gaining popularity, a 2D image is represented as an implicit neural representation (INR) — an MLP that predicts an RGB pixel value given its \((x,y)\) coordinate. In this paper, we propose two novel architectural techniques for building INR-based image decoders: factorized multiplicative modulation and multi-scale INRs, and use them to build a state-of-the-art continuous image GAN. Previous attempts to adapt INRs for image generation were limited to MNIST-like datasets and do not scale to complex real-world data. Our proposed INR-GAN architecture improves the performance of continuous image generators by several times, greatly reducing the gap between continuous image GANs and pixel-based ones. Apart from that, we explore several exciting properties of the INR-based decoders, like out-of-the-box superresolution, meaningful image-space interpolation, accelerated inference of low-resolution images, an ability to extrapolate outside of image boundaries, and strong geometric prior.

Arxiv Code

Main idea

INR-based decoders (right) are structured differently from the convolutional ones (left). They are composed of a hypernetwork (a neural network which generates parameters for another neural network) and an MLP which produces an RGB value from the pixel coordinate. In our work, we introduced two techniques to make this parametrization much more efficient.

Properties

The key feature of the INR-based decoders lies in its properties. In our paper, we explore several of them: image extrapolation, superresolution, meaningful interpolation, strong geometric prior and others.

Our INR-based decoder is capable to extrapolate outside of image boundaries without being trained to do so. Originally we thought that we were the first to show this, after the submission we found that in the COCO-GAN paper authors demonstrated the same property.

INR-GAN has meaningful interpolations in the image space (i.e. in the parameter space of the INRs)

INR-based decoder can perform superresolution out-of-the-box by evaluating on a denser coordinate grid.

We fitted a linear model to predict face keypoints from latent codes and observed that for INR-GAN, achieves much better performance than for StyleGAN2. This shows that the keypoints (and hence other geometric information) is encoded in a less entangled form in INR-GAN.

Related work

CIPS is a contemporary work which also builds a large-scale INR-based GAN for image generation.

BibTeX

@InProceedings{inr-gan,
    author    = {Skorokhodov, Ivan and Ignatyev, Savva and Elhoseiny, Mohamed},
    title     = {Adversarial Generation of Continuous Images},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {10753-10764}
}
@inproceedings{cips,
    title={Image generators with conditionally-independent pixel synthesis},
    author={Anokhin, Ivan and Demochkin, Kirill and Khakhulin, Taras and Sterkin, Gleb and Lempitsky, Victor and Korzhenkov, Denis},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={14278--14287},
    year={2021}
}