Skip to content

Commit

Permalink
resort
Browse files Browse the repository at this point in the history
  • Loading branch information
yzy1996 committed Oct 25, 2022
1 parent 1648ae4 commit 3827cec
Show file tree
Hide file tree
Showing 33 changed files with 84 additions and 188 deletions.
File renamed without changes.
File renamed without changes.
33 changes: 33 additions & 0 deletions 1-Variational AutoEncoder (VAE)/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
# Variational Auto-Encoder (VAE)

The majority of the research efforts on improving VAEs is dedicated to the statistical challenges, such as:

- reducing the gap between approximate and true posterior distribution
- formulatig tighter bounds
- reducing the gradient noise
- extending VAEs to discrete variables
- tackling posterior collapse
- designing special network architectures
- previous work just borrows the architectures from the classification tasks



VAEs maximize the mutual information between the input and latent variables, requiring the networks to retain the information content of the input data as much as possible.

Information maximization in noisy channels: A variational approach
**[`NeurIPS 2017`]**

Deep variational information bottleneck
**[`ICLR 2017`]**





学习资料

https://jaan.io/what-is-variational-autoencoder-vae-tutorial/
Expand Down Expand Up @@ -78,3 +102,12 @@ VQVAE通过Encoder学习出中间编码,然后通过最邻近搜索将中间

【这样其实实现的是一种压缩的效果】





参考:

https://www.jeremyjordan.me/variational-autoencoders/

https://www.jeremyjordan.me/autoencoders/
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
217 changes: 29 additions & 188 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,10 @@
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity)
[![PR's Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com)

A collection of resources on 2D Generative Model.
A collection of resources on 2D Generative Model which utilize generator functions that map low-dimensional latent codes to high-dimensional data outputs..

A collection of resources on generative models which utilize generator functions that map low-dimensional latent codes to high-dimensional data outputs.



We would define a prior distribution for the latent space, however this prior may not match the true and agnostic data manifold. It’s an obstacles yielding less accurate generation.

## Contributing

Feedback and contributions are welcome! If you think I have missed out on something (or) have any suggestions (papers, implementations and other resources), feel free to pull a request or leave an issue. I will release the [latex-pdf version]() in the future. :arrow_down:markdown format:
Expand All @@ -24,46 +20,58 @@ Feedback and contributions are welcome! If you think I have missed out on someth

:smile: Now you can use this [script](https://github.com/yzy1996/Python-Code/tree/master/Python%2BarXiv) to automatically generate the above text.

## Category

**3D-Aware Generation** has been moved to **[Learn 3D from 2D](https://github.com/yzy1996/Awesome-Learn-3D-From-2D)**

## Contents

**GAN related sources** has been moved to **[GAN](https://github.com/yzy1996/Awesome-GANs)**

**3D-Aware Generation** has been moved to **[Learn 3D from 2D](https://github.com/yzy1996/Awesome-Learn-3D-From-2D)**



1. [Variational AutoEncoder (VAE)](./1-Variational-AutoEncoder-(VAE))
2. [Diffusion Model](./2-Diffusion-Model)
3. [Energy-Based Model (EBM)](./3-Energy-Based-Model-(EBM))
4. [Flow](./4-Flow)
5. [Representation Learning](./5-Representation-Learning)
6. [Disentangled Representation](./6-Disentangled-Representation)
7. [Text-to-Image](./7-Text-to-Image)
8. [Evaluation & Loss](./8-Evaluation-&-Loss)
9. [Others](./Others)



## Introduction

photorealistic image synthesis
![img](https://raw.githubusercontent.com/yzy1996/Image-Hosting/master/generative-overview.png)

- high resolution cc
- content controllable


<details><summary>中文介绍</summary><p>

compositional nature of scenes
表征(representation)和重构(reconstruction)一直是不分家的两个研究话题。

- individual objects' shapes
- appearances
- background
核心目标是重构,但就像我看到一幅画面,想要转述给另一个人,让他也想象出这个画面的场景,人会将这幅画抽象为一些特征,例如这幅画是自然风光,有很多树,颜色很绿,等等。然后另一个人再根据这些描述,通过自己预先知道的人生阅历,就能还原这幅画。或者就像公安在找犯人的时候,需要通过描述嫌疑人画像。是通过一些特征在刻画的。

机器同样也需要这样一套范式,只不过可能并不像人一样的语意理解。为了可解释性,以及可控性,我们是希望机器能按照人能理解的一套特征来。

</p>
</details>

Modern computer graphics (CG) techniques have achieved impressive results and are industry standard in gaming and movie productions. However, they are very hardware and computing expensive and require substantial repetitive labor.

Therefore, the ability to generate and manipulate photorealistic image content is a long-standing goal of computer vision and graphics.

There models try to model the real world by generating realistic samples from latent representations.
The ability to generate and manipulate photorealistic image content (**high resolution** & **content controllable**) is a long-standing goal of computer vision and graphics. We try to model the real world by generating realistic samples from latent representations.



<Generating images with sparse representations> divide deep generative models broadly into three categories:
Deep generative models can be divided broadly into three categories:

- Generative Adversarial Networks
- **Generative Adversarial Networks**

> use discriminator networks that are trained to distinguish samples from generator networks and real examples
- Likelihood-based Model
- **Likelihood-based Model**

> directly optimize the model log-likelihood or the evidence lower bound.
Expand All @@ -75,178 +83,11 @@ There models try to model the real world by generating realistic samples from la

- autoregressive models

- Energy-based Models
- **Energy-based Models**

> estimate a scalar energy for each example that corresponds to an unnormalized log-probability


### VAE

The majority of the research efforts on improving VAEs is dedicated to the statistical challenges, such as:

- reducing the gap between approximate and true posterior distribution
- formulatig tighter bounds
- reducing the gradient noise
- extending VAEs to discrete variables
- tackling posterior collapse
- designing special network architectures
- previous work just borrows the architectures from the classification tasks



VAEs maximize the mutual information between the input and latent variables, requiring the networks to retain the information content of the input data as much as possible.

Information maximization in noisy channels: A variational approach
**[`NeurIPS 2017`]**

Deep variational information bottleneck
**[`ICLR 2017`]**





表征(representation)和重构(reconstruction)一直是不分家的两个研究话题。

核心目标是重构,但就像我看到一幅画面,想要转述给另一个人,让他也想象出这个画面的场景,人会将这幅画抽象为一些特征,例如这幅画是自然风光,有很多树,颜色很绿,等等。然后另一个人再根据这些描述,通过自己预先知道的人生阅历,就能还原这幅画/

或者就像公安在找犯人的时候,需要通过描述嫌疑人画像。是通过一些特征在刻画的。

机器同样也需要这样一套范式,只不过可能并不像人一样的语意理解

为了可解释性,以及可控性,我们是希望机器能按照人能理解的一套特征来

![image-20220612154943172](https://raw.githubusercontent.com/yzy1996/Image-Hosting/master/image-20220612154943172.png)



AutoDecoder





这里又需要提及一下重建loss



## Introduction

Generative models can be divided into two classes:

- implicit generative models (IGMs)
- explicit generative models (EGMs)



Our goal is to train a model $\mathbb{Q}_{\theta}$ which aims to approximate a target distribution $\mathbb{P}$ over a space $\mathcal{X} \subseteq \mathbb{R}^{d}$.

Normally we define $\mathbb{Q}_{\theta}$ by a generator function $G_{\theta}: \mathcal{Z} \rightarrow \mathcal{X}$, implemented as a deep network with parameters $\theta$, where $\mathcal{Z}$ is a space of latent vectors, say $\mathcal{R}^{128}$. We assume a fixed Gaussian distribution on $\mathcal{Z}$, and call $\mathbb{Q}_{\theta}$ the distribution of $G_{\theta}(Z)$.

The optimization process is to learn by minimizing a discrepancy $\mathcal{D}$ between distributions , with the property $\mathcal{D}(\mathbb{P}, \mathbb{Q}_{\theta}) \geq 0$ and $\mathcal{D}(\mathbb{P}, \mathbb{P})=0$.



we can build loss $\mathcal{D}$ based on the Maximum Mean Discrepancy,
$$
\operatorname{MMD}_{k}(\mathbb{P}, \mathbb{Q})=\sup _{f:\|f\|_{\mathcal{H}_{k}} \leq 1} \mathbb{E}_{X \sim \mathbb{P}}[f(X)]-\mathbb{E}_{Y \sim \mathbb{Q}}[f(Y)]
$$
where $\mathcal{H}_k$ is the reproducing kernel Hilbert space with a kernel $k$.





Wasserstein distance
$$
\mathcal{W}(\mathbb{P}, \mathbb{Q})=\sup _{f:\|f\|_{\text {Lip }} \leq 1} \mathbb{E}_{X \sim \mathbb{P}}[f(X)]-\mathbb{E}_{Y \sim \mathbb{Q}}[f(Y)]
$$





There are three main methods:

- VAE

- GAN
- Flow

They both learn from the training data and use the learned model to generate or predict new instances.



相同点:都用到了随机噪声,然后度量噪声和真实数据的分布差异

不同点:GAN为了拟合数据分布,VAE为了找到数据的隐式表达,Flow建立训练数据和生成数据之间的关系

GAN 和 Flow 的输入和输出都是一一对应的,而VAE不是



训练的损失函数上:

VAE最大化ELBO,其目的是要做最大似然估计,最大似然估计等价于最小化KL,但这个KL不是数据和噪声的KL,而是model给出的![[公式]](https://www.zhihu.com/equation?tex=p%28x%29)和数据所展示的![[公式]](https://www.zhihu.com/equation?tex=p%28x%29)之间的KL。

GAN是最小化JS,这个JS也是model给出的![[公式]](https://www.zhihu.com/equation?tex=p%28x%29)和数据所展示的![[公式]](https://www.zhihu.com/equation?tex=p%28x%29)之间的。

流模型训练也非常直接,也是最大似然估计。只不过因为流模型用的是可逆神经网络,因此,相比于其他两者,学习inference即学习隐含表示非常容易,




## GAN 2014

Generative Adversarial Networks (GANs) emerge as a powerful class of generative models. In particular, they are able to synthesize photorealistic images at high resolutions ($$1024 \times 1024$$) pixels which can not be distinguished.



GANs and its variants



train with adversarial methods, bypass the need of computing densities, at the expense of a good density estimation

Generative adversarial networks (GANs) represent a zero-sum game between two machine players, a generator and a discriminator, designed to learn the distribution of data.



> 只要能骗过Discriminator就好


## VAE 2013

at the cost of learning two neural networks





## VAE-GAN

combine VAE with GAN



## Bijective GNN



## Flow



## Inverse Rendering / Graphics

Given 2D image observations, these approaches aim to infer a 3D-structure-aware representation of the underlying scene that enables prior-based predictions about occluded parts.



参考:

https://www.jeremyjordan.me/variational-autoencoders/

https://www.jeremyjordan.me/autoencoders/
22 changes: 22 additions & 0 deletions 结构.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,3 +261,25 @@ Neural Radiance Field (NeRF)







Our goal is to train a model $\mathbb{Q}_{\theta}$ which aims to approximate a target distribution $\mathbb{P}$ over a space $\mathcal{X} \subseteq \mathbb{R}^{d}$.

Normally we define $\mathbb{Q}_{\theta}$ by a generator function $G_{\theta}: \mathcal{Z} \rightarrow \mathcal{X}$, implemented as a deep network with parameters $\theta$, where $\mathcal{Z}$ is a space of latent vectors, say $\mathcal{R}^{128}$. We assume a fixed Gaussian distribution on $\mathcal{Z}$, and call $\mathbb{Q}_{\theta}$ the distribution of $G_{\theta}(Z)$.

The optimization process is to learn by minimizing a discrepancy $\mathcal{D}$ between distributions , with the property $\mathcal{D}(\mathbb{P}, \mathbb{Q}_{\theta}) \geq 0$ and $\mathcal{D}(\mathbb{P}, \mathbb{P})=0$.



we can build loss $\mathcal{D}$ based on the Maximum Mean Discrepancy,
$$
\operatorname{MMD}_{k}(\mathbb{P}, \mathbb{Q})=\sup _{f:\|f\|_{\mathcal{H}_{k}} \leq 1} \mathbb{E}_{X \sim \mathbb{P}}[f(X)]-\mathbb{E}_{Y \sim \mathbb{Q}}[f(Y)]
$$
where $\mathcal{H}_k$ is the reproducing kernel Hilbert space with a kernel $k$.

Wasserstein distance
$$
\mathcal{W}(\mathbb{P}, \mathbb{Q})=\sup _{f:\|f\|_{\text {Lip }} \leq 1} \mathbb{E}_{X \sim \mathbb{P}}[f(X)]-\mathbb{E}_{Y \sim \mathbb{Q}}[f(Y)]
$$

0 comments on commit 3827cec

Please sign in to comment.