One available option for the MMM’s time-varying parameters are our so-called mean-zero Gaussian processes.
Straight to the definition
We’ll explain the definition of a mean-zero Gaussian process (GP) by contrasting it with a typical GP.
Typical Gaussian processes
A typical (non-mean-zero) GP can be defined as
where
Instead of sampling from that multivariate normal directly, we can first calculate the Cholesky decomposition
where
Note that, even though the expected prior mean of
Mean-zero Gaussian processes
We define a mean-zero GP starting from the Cholesky decomposition implementation
First we place a sum-to-zero constraint on
We then define
This is the mean-zero GP.
Applications in the MMM
We use these mean-zero GPs to build the MMM’s time-varying parameters.
Suppose we want to build a channel’s beta parameter,
where
-
is a scalar between 0 and 1, -
is a scalar with , -
and
is a mean-zero GP. We scale it so that each entry of has a prior variance of 1.
We construct the final beta parameter using the inverse logit transform:
where
We use this construction for essentially every time-varying parameter in the MMM.
Why?
Constructing
-
The scalar
acts as a hyperparameter that controls how much of comes from or from . If is small then is approximately constant, and if is large then is approximately a mean-zero GP. -
Putting
and in square roots makes it so that no matter what is, each entry of always has a prior variance of approximately 1. -
We have
, giving us a simple parameter we can use to share information across channels. Indeed, when we link channels in the model, we are allowing their parameters to influence one another.
Appendix: Motivation for the construction
The original motivation for mean-zero GPs came from wanting a single parameter
Naively we tried taking a typical GP
but ran into a major issue. It turns out that when you use the Cholesky decomposition method to build the
If
where
and so
Since
You might be interested to see what that vector
What can we do to constrain
If we force
The final step, going from
to
gives us a new hyperparameter
Footnotes
[1] A great intro to Gaussian processes is Yuge Shi’s “Gaussian Processes, not quite for dummies”, The Gradient, 2019.