Table of Contents
1. Variational Autoencoders
Solving SLAM with variational inference 2017
Stochastic Backpropagation and Approximate Inference in Deep Generative Models 2014. A recognition model to represent an approximate posterior distribution and use this for optimization of a variational lower bound.
(1) Deep Latent Gaussian Models - Generative Models : each layer l is generated by the upper layer (l+1) using the model parameterized by $\theta^{g}$.
graph LR A["h2"] ==> B["..."] B ==> D["h1"] D ==> C["v"] T["θ network parameters"] --> A T --> D T --> B T --> C
(2) Recognition Models optimize the lower bound of the marginal likelihood, using $q(\xi)$ the recognition model. V is the dataset of $\mathbb{R}^{D\times N}$.
\[\begin{align*} \mathcal{L}(V) & = - log(p(V)) = -\log \int p(V \vert \xi, \theta^{g})p(\xi, \theta^{g}) d\xi \\ & = -\log \int \frac{q(\xi)}{q(\xi)} p(V \vert \xi, \theta^{g})p(\xi, \theta^{g}) d\xi \\ \le \mathcal{F}(V) & = D_{KL} [q(\xi) \parallel p(\xi)] - \mathbb{E}_{q}[ \log p(V \vert \xi, \theta^{g}) p(\theta^{g}) ] \end{align*}\]For simplicity, $q(\xi\vert v)$ could be a gaussian distribution that factories across the L layers. \(\theta_{r} = \{ \mu_{l}(v_{n}), C_{l}(v_{n})\}_{l,n}\).
\[q(\xi|V, \theta^{r}) = \prod_{n=1}^{N}\prod_{l=1}^{L} \mathcal{N}(\xi_{n, j}\vert \mu_{l}(v_{n}), C_{l}(v_{n}))\]To further simplify the function, set $C^{-1} = D + uu^{T}$.
(3) Stochastic Backpropagation, for a stochastic case $\theta(\xi) = \mathcal{N}(\xi \vert \mu, C)$, (for a standard case, $\theta = \mu$). Then we need to compute the derivative of the loss function $f(\xi)$ w.r.t $\mu$ and C.
\[\begin{align*} \triangledown_{\mu_{i}} \mathbb{E}_{\mathcal{N}(\mu, C)}[f(\xi)] & = \mathbb{E}_{\mathcal{N}(\mu, C)}[\triangledown_{\xi_{i}}f(\xi)] \\ \triangledown_{C_{ij}} \mathbb{E}_{\mathcal{N}(\mu, C)}[f(\xi)] & = \frac{1}{2} \mathbb{E}_{\mathcal{N}(\mu, C)} [\triangledown_{\xi_{i},\xi_{j}}^{2}f(\xi)] \end{align*}\]combine these derivatives, we can compute the derivative w.r.t. model parameters $\theta$. (g for gradient, H for hessian)
\[\triangledown_{\theta} \mathbb{E}_{\mathcal{N}(\mu, C)}[f(\xi)] = \mathbb{E}_{\mathcal{N}(\mu, C)} [ g^{T}\frac{\partial \mu}{\partial \theta} + \frac{1}{2}Tr(H \frac{\partial C}{\partial \theta} ) ]\]But the computation of Hessian matrix is expensive, so we have two ways to reduce the complicity.
- Using the product rule for integrals.
- Using suitable coordinate transformations R to represent the gaussian distribution. $\mathcal{N}(\mu + R\epsilon, RR^{T})$ (which is used in the paper)
Then we could compute $\triangledown_{\theta^{g}}\mathcal{F}(V)$ and $\triangledown_{\theta^{r}}\mathcal{F}(V)$ to update the models.
(4) Algorithm:
while hasNotConverged() do
V = getMiniBatch();
(bottom-up pass) xi = {xi_n}; xi_n = sample[q(xi, v_n)];
(top-down pass) h = h(xi);
delta_theta_g, delta_theta_r = updateGradients();
theta_g = theta_g + delta_theta_g;
theta_r = theta_r + delta_theta_r;
2. Deep Kalman Filter
Deep Kalman Filters Can Filter 2023
Structured Inference Networks for Nonlinear State Space Models 2016.
- implementations : github-structuredinference, github-dmm, github-deepHMM.
- Gaussian state space model.
- Optimize a variational lower bound on the data log-likelihood.
Deep Kalman Filters 2015, in the following kalman filter model, $G_{\alpha}, S_{\beta}, F_{\kappa}$ are assumed to be parameterized by deep neural networks.
\[\begin{align*} &z_{1} \sim \mathcal{N} (\mu_{0}; \Sigma_{0}) \\ &z_{t} \sim \mathcal{N} (G_{\alpha}(z_{t-1}, u_{t-1}, \Delta_{t}), S_{\beta}(z_{t-1}, u_{t-1}, \Delta_{t})) \\ &x_{t} \sim \Pi(F_{\kappa}(z_{t})) \end{align*}\]- $\hat{z} = q_{\phi}(z\vert x, u)$ for state estimation - kalman filter.
- $\hat{x} = p_{\theta}(x\vert\hat{z})$ for pattern reconstruction (denoise).
-
This could be used as a base line for large model in image processing.