# Ivan Skorokhodov

I am a CS PhD student in Visual Computing Center @ KAUST University supervised by prof. Peter Wonka. I do deep learning and my research interests include generative models, neural rendering and hypernetworks (and I also have a small weakness for adversarial robustness). Before that, I was a deep learning researcher at MIPT for 2 years — first, working on NLP and then, on loss landscape analysis. Before MIPT, I was a software engineer at Yandex for 1.5 years.

## Research projects

• ### 3D generation on ImageNet

ICLR 2023 (Oral)

Ivan Skorokhodov, Aliaksandr Siarohin, Yinghao Xu, Jian Ren, Hsin-Ying Lee, Peter Wonka, Sergey Tulyakov

Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location and orientation. This makes them inapplicable to diverse, in-the-wild datasets of non-alignable scenes rendered from arbitrary camera poses. In this work, we develop a 3D generator with Generic Priors (3DGP): a 3D synthesis framework with more general assumptions about the training data, and show that it scales to very challenging datasets, like ImageNet. Our model is based on three new ideas: 1) using an off-the-shelf depth estimator to guide the learning of 3D geometry; 2) a flexible learnable camera generator and a regularization strategy for; and 3) knowledge distillation into the discriminator to transfer the external knowledge from a pre-trained feature extractor. We explore our model on four datasets and demonstrate that 3DGP outperforms the recent state-of-the-art in terms of both texture and geometry quality.

• ### EpiGRAF: Rethinking training of 3D GANs

NeurIPS 2022

Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, Peter Wonka

In the past several months, there appeared 10+ works that speed up NeRF-based GANs by training a separate 2D decoder to upsample a low-resolution 3D representation produced from the NeRF generator. This solution comes at a cost: it break multi-view consistency and learns the geometry in a low resolution. Instead, we show that it is possible to obtain a high-resolution 3D generator with SotA image quality by simply training the model patch-wise. We revisit and improve this optimization scheme in two ways: 1) by designing a location- and scale-aware discriminator to work on patches of different proportions and spatial positions; and 2) modifying the patch sampling strategy based on an annealed beta distribution to stabilize training and accelerate the convergence. The resulted model, named EpiGRAF, is an efficient, high-resolution, pure 3D generator, and we test it on four datasets (two introduced in this work) at $$256^2$$ and $$512^2$$ resolutions. It obtains state-of-the-art image quality, high-fidelity geometry and trains $${\approx} 2.5 \times$$ faster than the upsampler-based counterparts.

• ### StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

CVPR 2022

Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny

We build a non-autoregressive video generator which is continuous in time. It is based on StyleGAN2 and we rethink fundamental components of video synthesis models. First, we redesign the motion codes to be continuous by structuring them as acyclic positional embeddings. Then, we drop the usage of expensive Conv3d layers and aggregate the temporal information across frames by simple concatenation. Finally, we demonstrate that a state-of-the-art video generator could be trained with a very sparse sampling scheme, using just 2-3 frames per clip. Our modifications greatly improve the training efficiency of our model and we achieve strong state-of-the-art results on FaceForensics $$256^2$$, Sky Timelapse $$256^2$$, UCF-101 $$256^2$$, Rainbow Jelly $$256^2$$ and MEAD $$1024^2$$. We also demonstrate the video manipulation properties of our generator, like projecting a video into its latent space using just a single frame and CLIP-based editing.

• ### Aligning Latent and Image Spaces to Connect the Unconnectable

ICCV 2021

Ivan Skorokhodov, Grigory Sotnikov, Mohamed Elhoseiny

We proposed an idea of positioning GAN's latent codes on the coordinates plane. This means that each latent code, when sampled, is getting associated with an $$(x,y)$$-position of the 2D image plane and our generator computes a color of a pixel from the interpolation of the neighboring latent codes (instead of just a single global one). This allows us 1) to generate images of infinite size (by generating infinitely many latent codes and positioning them on the grid); and 2) connecting unrelated frames into a single, arbitrarily large panorama.

• ### Adversarial Generation of Continuous Images

CVPR 2021

Ivan Skorokhodov, Savva Ignatyev, Mohamed Elhoseiny

We built a GAN model that generates images in the implicit neural representation (INR) form. An INR is a function $$F(c)$$ which takes coordinates $$c = (x, y)$$ as input and predicts a pixel value $$v = (r, g, b)$$. In this way, our generator is a hypernetwork that generates parameters for $$F(c)$$. We proposed two techniques to scale such a model to real-world datasets: factorized multiplicative modulation (FMM) and multi-scale INRs. We achieved decent (for INR-based models) generative quality on LSUN Churches $$256^2$$, LSUN Bedrooms $$256^2$$, and FFHQ $$1024^2$$ and showed a lot of interesting properties of INR-based decoders. At the end of the day, our approach turned out to be very similar to StyleGAN2 with 1x1 convolutions, coordinate embeddings, and nearest neighbor upsampling.

• ### Class Normalization for (Continual?) Generalized Zero-Shot Learning

ICLR 2021

Ivan Skorokhodov, Mohamed Elhoseiny

In this paper, we dived into normalization techniques used in zero-shot learning (ZSL). We showed how scaled cosine similarity and attributes normalization influences signal's variance inside a model. We showed that for deeper models, there is a need for other normalization procedures and developed class normalization, which is similar to batch normalization but applied across the class dimension. Using class normalization, we built an MLP model that achieves state-of-the-art performance and trains x50-200 times faster than the current SotA. We also formulated a novel continual zero-shot learning problem and tested our approach in that setup.

• ### Loss Landscape Sightseeing with Multi-Point Optimization

Beyond First Order Methods in ML workshop, NeurIPS 2019

Ivan Skorokhodov, Mikhail Burtsev

Using mode connectivity ideas, we searched loss landscapes of different neural networks for different visual patterns. Due to the extreme overparametrization, it turned out that any pattern can be found inside the surface. This indicates that the loss landscapes of deep models are very complex and contain many irregularities.

## Programming projects

• ### Non-Uniform Interpolation

• CUDA

While the existing interpolation techniques (nearest neighbour, bilinear, Lanczos, Hamming, etc.) assume that the known points positions construct a uniform grid, it is not always the case. Moreover one would like to backpropagate through these points positions. In this project, I implemented a CUDA kernel for points interpolation on a non-uniform grid based on the Gaussian Mixture Model.

• ### RtRs

• rust

RtRs is a small ray-tracing/rasterization engine written in rust. It works on both meshes and traditional quadrics and has some cool features however, like distributed RT/BVHs/arcball rotations/etc.

• ### Omniplan Web App

• javasript
• react

Omniplan was extensively used at my previous work but didn't have any web interface which made everyone annoyed. So I built one using their official API.

• ### Firelab

• python
• pytorch

During the past 3 years, I had been building a framework for running deep learning experiments in pytorch and using it in my research projects. It is very similar to pytorch-lightning + hydra, but without a proper documentation and testing ¯\_(ツ)_/¯

• ### DL reasoner

• rust

An ALCQ description logic reasoner based on the tableau algorithm.