You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here, ll means the term $\mathbb{E}_{q(\mathbf{z}_1|\mathbf{x})}[\log p(\mathbf{x}|\mathbf{z}_1)]$.
We estimate it by sampling $\mathbf{z}_1 \sim \mathcal{N}(\alpha_1 \mathbf{x}, \sigma_1^2 \mathbf{I})$ and parameterize $p(\mathbf{x}|\mathbf{z}_1)$ as $\mathcal{N}(\frac{\mathbf{z}_1}{ \alpha_1}, \frac{\sigma_1^2}{\alpha_1^2} \mathbf{I})$.
Thus, ll$= - \frac{N}{2} \log (\frac{\sigma_1^2}{\alpha_1^2}) - \frac{\alpha_1^2}{2 \sigma_1^2} (\mathbf{x} - \frac{\mathbf{z}_1}{ \alpha_1})^T(\mathbf{x} - \frac{\mathbf{z}_1}{ \alpha_1}) + \mathcal{C}$ where $N$ is the signal length and $\mathcal{C}$ is just a constant.
Using the fact that $\frac{\mathbf{z}_1}{ \alpha_1} \sim \mathcal{N}(\mathbf{x}, \frac{\sigma_1^2}{\alpha_1^2} \mathbf{I})$, we approximate ll as $- \frac{N}{2} \log (\frac{\sigma_1^2}{\alpha_1^2}) - \frac{N}{2} + \mathcal{C} = \frac{N}{2} \delta_{max}- \frac{N}{2} + \mathcal{C}$ assuming $N$ is large enough.
In this research, audio data is used and is it continuous?
Hello.
I have a question about below formula. How did you derive this?
https://github.com/yoyololicon/diffwave-sr/blob/cab5c4e330c8b6d8b329a6c85812a7328fe3431c/loss.py#L20
In this research, audio data is used and is it continuous?
I would appreciate your cooperation.
The text was updated successfully, but these errors were encountered: