Table of Contents

  1. Variational Autoencoders
  2. Deep Kalman Filter

1. Variational Autoencoders

Solving SLAM with variational inference 2017

Stochastic Backpropagation and Approximate Inference in Deep Generative Models 2014. A recognition model to represent an approximate posterior distribution and use this for optimization of a variational lower bound.

(1) Deep Latent Gaussian Models - Generative Models : each layer l is generated by the upper layer (l+1) using the model parameterized by $\theta^{g}$.

graph LR
A["h2"] ==> B["..."]
B ==> D["h1"]
D ==> C["v"]
T["θ
network parameters"] --> A
T --> D
T --> B
T --> C
\[\begin{align*} \xi_{l} & \sim \mathcal{N}(\xi_{l} \vert 0, \mathbf{I}), l=1,...,L \\ h_{L} &= G_{L}\xi_{L} \\ h_{l} &= T_{l}(h_{l+1}) + G_{l}\xi_{l}, l=1,...,L-1 \\ v & \sim \pi (v \vert T_{0}(h_{l}) ) \end{align*}\]

(2) Recognition Models optimize the lower bound of the marginal likelihood, using $q(\xi)$ the recognition model. V is the dataset of $\mathbb{R}^{D\times N}$.

\[\begin{align*} \mathcal{L}(V) & = - log(p(V)) = -\log \int p(V \vert \xi, \theta^{g})p(\xi, \theta^{g}) d\xi \\ & = -\log \int \frac{q(\xi)}{q(\xi)} p(V \vert \xi, \theta^{g})p(\xi, \theta^{g}) d\xi \\ \le \mathcal{F}(V) & = D_{KL} [q(\xi) \parallel p(\xi)] - \mathbb{E}_{q}[ \log p(V \vert \xi, \theta^{g}) p(\theta^{g}) ] \end{align*}\]

For simplicity, $q(\xi\vert v)$ could be a gaussian distribution that factories across the L layers. \(\theta_{r} = \{ \mu_{l}(v_{n}), C_{l}(v_{n})\}_{l,n}\).

\[q(\xi|V, \theta^{r}) = \prod_{n=1}^{N}\prod_{l=1}^{L} \mathcal{N}(\xi_{n, j}\vert \mu_{l}(v_{n}), C_{l}(v_{n}))\]

To further simplify the function, set $C^{-1} = D + uu^{T}$.

(3) Stochastic Backpropagation, for a stochastic case $\theta(\xi) = \mathcal{N}(\xi \vert \mu, C)$, (for a standard case, $\theta = \mu$). Then we need to compute the derivative of the loss function $f(\xi)$ w.r.t $\mu$ and C.

\[\begin{align*} \triangledown_{\mu_{i}} \mathbb{E}_{\mathcal{N}(\mu, C)}[f(\xi)] & = \mathbb{E}_{\mathcal{N}(\mu, C)}[\triangledown_{\xi_{i}}f(\xi)] \\ \triangledown_{C_{ij}} \mathbb{E}_{\mathcal{N}(\mu, C)}[f(\xi)] & = \frac{1}{2} \mathbb{E}_{\mathcal{N}(\mu, C)} [\triangledown_{\xi_{i},\xi_{j}}^{2}f(\xi)] \end{align*}\]

combine these derivatives, we can compute the derivative w.r.t. model parameters $\theta$. (g for gradient, H for hessian)

\[\triangledown_{\theta} \mathbb{E}_{\mathcal{N}(\mu, C)}[f(\xi)] = \mathbb{E}_{\mathcal{N}(\mu, C)} [ g^{T}\frac{\partial \mu}{\partial \theta} + \frac{1}{2}Tr(H \frac{\partial C}{\partial \theta} ) ]\]

But the computation of Hessian matrix is expensive, so we have two ways to reduce the complicity.

  1. Using the product rule for integrals.
  2. Using suitable coordinate transformations R to represent the gaussian distribution. $\mathcal{N}(\mu + R\epsilon, RR^{T})$ (which is used in the paper)

Then we could compute $\triangledown_{\theta^{g}}\mathcal{F}(V)$ and $\triangledown_{\theta^{r}}\mathcal{F}(V)$ to update the models.

(4) Algorithm:

while hasNotConverged() do
   V = getMiniBatch();
   (bottom-up pass) xi = {xi_n}; xi_n = sample[q(xi, v_n)];
   (top-down pass) h = h(xi);
   delta_theta_g, delta_theta_r = updateGradients();
   theta_g = theta_g + delta_theta_g;
   theta_r = theta_r + delta_theta_r;

2. Deep Kalman Filter

Deep Kalman Filters Can Filter 2023

Structured Inference Networks for Nonlinear State Space Models 2016.

Deep Kalman Filters 2015, in the following kalman filter model, $G_{\alpha}, S_{\beta}, F_{\kappa}$ are assumed to be parameterized by deep neural networks.

\[\begin{align*} &z_{1} \sim \mathcal{N} (\mu_{0}; \Sigma_{0}) \\ &z_{t} \sim \mathcal{N} (G_{\alpha}(z_{t-1}, u_{t-1}, \Delta_{t}), S_{\beta}(z_{t-1}, u_{t-1}, \Delta_{t})) \\ &x_{t} \sim \Pi(F_{\kappa}(z_{t})) \end{align*}\]
  • $\hat{z} = q_{\phi}(z\vert x, u)$ for state estimation - kalman filter.
  • $\hat{x} = p_{\theta}(x\vert\hat{z})$ for pattern reconstruction (denoise).
  • This could be used as a base line for large model in image processing.