# Calculating Stability Metrics

**High Level Overview**

By default, stability compares the similarity of a model trained on an extra week of data to the previous version of the model, however any two models can be compared. Stability is measured as the proportion of overlap between the confidence/credible regions for each estimate. The more stable a model’s estimates are, the more overlap there will tend to be, even for narrow confidence/credible regions.

**Technical Details**

*Overlap Function*

For a single timestep for a single parameter, overlap is measured as:

Where

- a_ub is the upper bound of
**Model A’s**confidence/credible region - b_ub is the upper bound of
**Model B’s**confidence/credible region - a_lb is the lower bound of
**Model A’s**confidence/credible region - b_lb is the lower bound of
**Model B’s**confidence/credible region

The default confidence/credible region spans the 25th to the 75th percentiles (middle 50% of our estimates). Each time the model samples a draw from the posterior we record the parameter value. These percentiles come from looking at the distribution of parameter values across all draws.

This overlap value is then turned into a proportion by dividing by the sum of the widths of the two confidence/credible regions.

*Weighted Overlap*

Next, we calculate the overall stability *across* all parameters and all timesteps, as well as the stability for intercept, spend, lower funnel (if `include_lower_funnel`

is True), and spikes (if `include_spikes`

is True) individually. When the parameters we’re estimating are *large in magnitude* (far from 0) they may have a bigger impact on our `depvar`

. We want to account for that so we weight each estimate’s contribution to overall weighted overlap higher if it has a larger magnitude. When weighted overlap is calculated, each **overlap** is weighted by the average value of the 4 bounds (b_ub, b_lb, a_ub, a_lb). Thus for all parameters across all chains:

Note: While spend and intercept are *always* included in overall stability, spikes are only included if `include_spikes`

is True. Lower Funnel channels are never included in the overall stability, but stability for lower funnel channels is calculated separately if `include_lower_funnel`

is true.

Updated about 2 months ago