Building custom bijectors with Tensorflow Probability
How to make your own bijectors and normalizing flows
Introduction
We often need to approximate distributions using models. If the target distribution has a known form, such as a Gaussian, then we can simply find the values of the mean and variance that best fit the data. What if the data has a more complex distribution? Chances are, if you try to fit a simple distribution to complex data, the result will be mediocre. Luckily, Tensorflow Probability has straightforward tools for modelling complex distributions, via bijectors. A bijector is a Tensorflow component representing a diffeomorphism — a bijective, differentiable function — that allows us to move freely between random variables. To understand how this works, let’s take a look at the change of variables formula. Let X and Z be random variables, and f a bijective, differentiable function such that
The probability distributions of X and Z are related by
Using eq. 1, we can sample the transformed variable, and with eq. 2 we can calculate its probability density. Additionally, since compositions of diffeomorphisms are themselves diffeomorphisms, bijectors are composable; we refer to a series of bijectors as a normalizing flow. With the right bijectors, normalizing flows can transform simple distributions into complex ones.
Making custom bijectors
Tensorflow already implements many interesting bijectors, but we can build custom ones from scratch. Let’s make a bijector representing a rotation in two dimensions. We need to specify three things: the forward transformation, the inverse transformation, and the Jacobian determinant. Referring to the formulas above, these are all you need to sample the transformed variable and evaluate its density.
There’s a lot to unpack here, so let’s go through it step by step. The constructor takes in an argument theta
representing the angle of the rotation. The parameters forward_min_event_dims
and inverse_min_event_dims
are the minimum event dimensions of tensors on which the forward and inverse transformations operate, respectively. Since our bijector transforms 2D tensors, these are both set to 1. The forward transformation is defined by the following matrix multiplication:
The inverse transformation is similar, replacing θ with its negative. Finally, the Jacobian determinant of the forward transformation is
Notice that the function we implemented actually returns the logarithm of the Jacobian determinant, since we usually prefer to work in log-space. Also note that we do not need to explicitly define the log-Jacobian determinant of the inverse transformation, since according to the inverse function theorem, the Jacobian determinant of the inverse transformation is the reciprocal of the Jacobian determinant of the forward transformation.
Great! Our rotation bijector is ready to go.
Bijectors in practice
Let’s use our new bijector to transform a multivariate normal distribution. A rotation won’t do much by itself, but we can compose it with other bijectors to produce a more intricate flow. We’ll create the transformed distribution using Tensorflow’s TransformedDistribution
class. The constructor takes two arguments: the base distribution, and the bijector to transform it with.
Notice that we have to reverse the order of the bijectors before chaining them. Let’s compare the contour plots of the base and transformed distributions.
What’s next?
Normalizing flows are a powerful tool for modelling distributions. Aside from creating our own transformed distributions, we can also fit normalizing flows to data. Stay tuned for future articles about training normalizing flows!