save

BeomseoChoi · Nov 21, 2024 · 7c99556 · 7c99556
1 parent 1aabc89
commit 7c99556
Showing 1 changed file with 33 additions and 38 deletions.
diff --git a/_posts/2024-11-20-GAN.md b/_posts/2024-11-20-GAN.md
@@ -78,6 +78,8 @@ $$
 V(\mathcal{G}^{*}_{\theta}, \mathcal{D}^{*}_{\mathcal{G}_{\theta}}(x)) = -\log{4}.
 $$
 
+이 됩니다.
+
 ## Training GAN
 
 GAN을 학습하는 방식은 다음과 같다.
@@ -87,51 +89,55 @@ GAN을 학습하는 방식은 다음과 같다.
 4. $$\theta$$를 gradient descent로 업데이트한다.
 5. 반복한다.
 
-Pros.
+## Pros, Cons, and Limitations
+
+### Pros.
 1. No likelihood. Don't need to know the density.
+   - Density를 명시적으로 알 필요가 없습니다.
 2. Flexibility for neural network architecture. We haven't constraint the architecture of G.
+   - $$\mathcal{G}$$의 신경망 구조에 대한 제약이 없습니다.
 3. Fast sampling. Sing forward pass through G.
+   - $$\mathcal{G}$$만으로 샘플을 빠르게 생성할 수 있습니다.
 
-Cons.
+### Cons.
 1. Very difficult to train in practice.
 
-GAN의 세 가지 Challenges가 있다.
-1. Unstable optimization
-2. Mode collapse
-3. Evaluation
-
-
+### Unstable optimization
 {% include figure.liquid path="assets/img/2024-11-20-GAN/loss.jpg" class="img-fluid rounded z-depth-1" zoomable=true %}
-첫 번째 문제점은 oscillating이 심하다는거다. 위 그래프 보고 어디서 멈춰야할지 감이 안온다.
+$$\mathcal{D}$$와 $$\mathcal{G}$$의 경쟁적인 구조 때문에 최적화 과정이 매우 불안정합니다. Oscillating이 매우 심합니다.
 
-두 번째 문제점은 G collapses to one or few samples라는거다. D가 너무 학습이 잘돼서 G가 학습이 안된다.
+### Mode collapse
+$$\mathcal{G}$$가 일부 샘플만 생성하여 다양성이 부족한 샘플을 생성하는 현상이 발생할 수 있습니다.
 
-세 번째 문제점은 
 
+### Evaluation
+GAN의 성능을 평가할 객관적인 지표를 정의하기 어렵습니다.
 
 ---
 
 Likelihood-free training이기 때문에 특별한 density의 architecture를 선택할 필요가 없다.
 
-f-divergence.
+## f-divergence.
 지금까지 두 가지 divergnce가 나왔다. KLD, JSD.
 
-Divergence를 일반화한 방식이 f-divergence다. 
+Divergence의 일반화된 형태인 f-divergence는 다음과 같이 정의됩니다.
 
 $$
-D_{f}(p, q) = \mathbb{E}_{x \sim q}\left[f\left(\frac{p(x)}{q(x)}\right)\right] \text{, where f is convex and lower-semicontinuous with f(1) = 0.}
+D_{f}(p, q) = \mathbb{E}_{x \sim q}\left[f\left(\frac{p(x)}{q(x)}\right)\right] \text{, where }
 $$
 
+- $$f$$ is convex.
+- $$f$$ is lower-semicontinuous.
+- $$f(1) = 0$$.
+
 {% include figure.liquid path="assets/img/2024-11-20-GAN/f-divergence.jpg" class="img-fluid rounded z-depth-1" zoomable=true %}
 
-Nowozin et al., 2016.
+$$
+\min_{\mathcal{G}_{\theta}}\max_{\mathcal{D}_{\phi}} F(\theta, \phi) = \mathbb{E}_{x \sim P_{data}}\left[\mathcal{T}_{\phi}(x)\right] - \mathbb{E}_{x \sim P_{\theta}}\left[f^{*}(\mathcal{T}_{\phi}(x))\right] \text{ (Nowozin et al., 2016.)}
+$$
 
 
-f-divergence를 잘 유도하면 f-GAN 식이 나온다.
 
-$$
-\min_{\mathcal{G}_{\theta}}\max_{\mathcal{D}_{\phi}} F(\theta, \phi) = \mathbb{E}_{x \sim P_{data}}\left[\mathcal{T}_{\phi}(x)\right] - \mathbb{E}_{x \sim P_{\theta}}\left[f^{*}(\mathcal{T}_{\phi}(x))\right]
-$$
 
 막간 support
 
@@ -147,32 +153,21 @@ $$
 
 p랑 q가 suupport를 share하지 않음. 학습 초반에 G가 생성하는 샘플이랑 훈련 데이터가 많이 달라서 발생함.
 
-needed a "smmother" distance D(p, q) that is defined when p and q have disjoint supports.
-support를 cover하지 않는, 즉 discontinuity problem이 arise할 때 어떻게 해결할 것인가 -> wasserstein distance.
+학습 초반에 \mathcal{G}가 생성하는 샘플이랑 훈련 데이터가 차이가 나서 $$p$$와 $$q$$가 support를 공유하지 않아 discontinuity 문제가 발생한다. 그래서 "Smoother" distance가 필요하고, 이 역할을 하는 것이 Wasserstein Distance입니다.
 
-wasserstein distance
+## Wasserstein distance
 
 $$
 D_{w}(p, q) = \inf\limits_{\gamma \in \prod} \mathbb{E}_{(x, y) \sim \gamma}\left[ \left\| x - y \right\| \right]
 $$
 
-$$
-\text{,where } \prod \text{contains all possible joint distributions of } (x, y).
-$$
-
-$$
-\text{marginal of } x \text{ is } p(x) = \int \gamma(x, y)dy.
-$$
-
-$$
-\text{marginal of } y \text{ is } p(y) = \int \gamma(x, y)dx.
-$$
-
-$$
-\gamma(y \mid x) \text{ : a probabilistic earth moving plan that warps } p(x) \text{ to } q(y).
-$$
+- $$\prod$$은 $$x$$, $$y$$에 대한 가능한 모든 joint distributions 집합입니다.
+- $$x$$와 $$y$$의 marginal 분포는 각각 $$p(x)$$, $$q(y)$$를 만족해야 합니다.
+  - $$\text{marginal of } x \text{ is } p(x) = \int \gamma(x, y)dy$$.
+  - $$\text{marginal of } y \text{ is } p(y) = \int \gamma(x, y)dx$$.
+- $$\gamma(y \mid x) \text{ : a probabilistic earth moving plan that warps } p(x) \text{ to } q(y)$$.
 
-Kantorovich-Rubinstein duality.
+## Kantorovich-Rubinstein duality.
 
 $$
 D_{w}(p, q) = \sup\limits_{\left\|f\right\|_{L} \le 1} \mathbb{E}_{x \sim p}\left[ f(x) \right] - \mathbb{E}_{x \sim q} \left[ f(x) \right]