🌎 [Beta] GeoLift by Recast

Why run a GeoLift test?

GeoLift is an alternative method of estimating the incremental effect of advertising on your business. It can be used to validate an MMM (build confidence that an MMM is providing true estimates) and calibrate an MMM, particularly for channels with large confidence intervals.

What is a GeoLift test?

A GeoLift test is an experiment conducted to find the true incremental value of your ad spend. During a lift test, you change the spend (increase or decrease) in selected geographies and compare the results with the outcome in the geographies where spend was held constant. By comparing the changes in the test and control groups, you can find the ‘lift’ or incremental change in revenue driven exclusively by the ad spend.

When to run a lift test?

GeoLift can be particularly helpful when you are seeing large confidence intervals in your model or when your in-platform estimates differ from your model’s estimates.

By conducting a lift test, you are able to provide your model with real information on the effect of your ad spend. This helps the model anchor its parameters to a ground truth. As a result you should see the confidence interval of your model’s estimates decrease following a GeoLift test.

Moreover, if you are seeing different estimates by your model and in-platform numbers, it can be difficult to take action on the numbers you are seeing. Using GeoLift, you can provide your model with a snapshot of more precise incrementality data to train and improve your estimates. Through the cycle of learning more about the incrementality of each channel and improving your estimates, you can make more confident bets and improve your outcome.

Limitations of Geolift

Advertising buys must allow geographic targeting – if it’s hard to limit spend inside a geography and there’s spillage to surrounding geographies, the estimates won’t be as accurate.

GeoLift provides point in time estimates – unlike MMM, the performance measured from a GeoLift test applies to a particular point in time. Measuring different points in time would require running a new test. Our Bayesian MMM works well with the GeoLift output as we estimate incrementality for each channel every day. This means we can apply the knowledge gained from the GeoLift model accurately to the specific time when the test was run without making assumptions about marketing performance over time.

Heterogeneous geos- If different states have different demographics who respond differently to ads, the results will be less accurate

Outliers - eg. a major localized event during the test period could skew the results.


What can I do with GeoLift by Recast?

GeoLift by Recast allows you to both analyze the results of lift tests as well as design optimal lift tests.

How does GeoLift work statistically?

GeoLift uses a statistical method called augmented synthetic controls to help estimate the incremental effect of the experimental change. When running an experiment, you typically want your test and control groups to be as similar as possible (except for the treatment). Synthetic control models help with this by analyzing the pre-test period and assigning each control geography a weight such that when you sum up the conversions in the control group, it is as close to the test group as possible. Then, these weights are used when analyzing the experiment to determine if the change in conversions in the test group was caused by the spend changes or by random fluctuations.


How to design an optimal GeoLift test?

The main goals of the Design tool is to select geographies to include in your test and control groups and get a spend recommendation for the test group.

Step 1: Ingest your data

First we need a dataset of the historical outcome variable for each day for each geography. Geographies can be at any level that you have the ability to target (e.g. states, DMAs, etc). If you use zip codes as the geography, GeoLift can automatically convert these to commuting zones. Your CSV should contain 3 columns:

  • Location (geography)
  • Date
  • KPI

You can use the tool to map the columns in your dataset to the date, outcome variable and location ID columns as well as specify the date format in your dataset.

Once you have mapped your columns, you can ingest your dataset. You can see your KPI over the time period for your top 8 geographies by volume as well as click through your dataset. Use these visualizations to check for any data problems.


The historical dataset helps the experimentation tool analyze similarities and differences in geographical performance to select the best geographies to include in your test group. Comparison between geographies with similar historical performance results in more powerful statistical analysis as we can more accurately identify the differences as a result of the spend change.


Step 2: Configure your analysis

During the configuration phase, you will provide parameters for GeoLift to work with. GeoLift will use those parameters to run simulations in order to determine which geographies would make up the best test as well as a recommended spend amount to get usable results.

First select your experient type.


There are two types of experiments:

Spend increase: This type of experiment makes sense when you have extra money and want to increase your KPI, you think you might be underspending in the channel and want to test increased spend, or when you are adding a brand new channel to your mix.

Spend decrease: This type of experiment makes sense when you want to save money (at the cost of some of your KIP), or when you think you might be overspending in a channel.
If you’re not sure which to do you can do the analysis twice and compare the recommended plans for each type.


Next, select the KPI type: revenue or conversions. If your KPI is not either, select conversions if you think in terms of CPA and revenue if you think in terms of ROI.

Then select your experiment parameters.


Effect size to simulate: This number will be used as a starting point in the simulation to determine the best geographies to explore. Ultimately GeoLift will recommend a spend amount and effect size to target, but this gives GeoLift a place to start. If you are doing a “decrease spend” test, a good way to set this number is to imagine turning off this channel completely. How much would you expect revenue or conversions to drop? If the answer is 10%, set this number to 10%. Your Recast MMM waterfall report may help give you a good starting point for that number. If you are increasing spend, try to get a rough idea of how much you could increase conversions/revenue by cranking up spend in certain regions. For example, if a channel that you are testing currently drives 3% of your conversions, and you expect that by doubling your spend, you will double your conversion, set your effect size to This is just a starting point, so don’t worry about being exactly right here.


Approximate Channel CPA/ROI: Your expected CPA helps Recast calculate the spend change required to drive the selected effect size. For example, suppose you want to drive an effect size of 5% meaning you want to increase conversions by 5% of the previous baseline. This could be an increase from 500 conversions to 525 conversions. If your approximate expected CPA is $40, you will need to spend an additional $1,000 to drive the effect size. Your Recast MMM may help you determine a good approximate CPA/ROI to use. In general, using a conservative number (high CPA / low ROI) will result in a more conservative test, meaning GeoLift will recommend more dramatic changes and you’ll have more statistical power.


Experiment length: Provide how many days you want to run the experiment. If you’re not getting good results with a smaller number of days, increasing the amount of days may help.


Optionally, you can select certain geos to include or exclude in your test geo. If you leave this blank Recast will select the optimal test geo for you. This can be useful if, for example, you cannot increase the spend in certain geos or you want to exclude certain geographies because other changes happening in the geography may confound the experiment.


Finally, you can enter the number of locations to include in your test group. If you would like GeoLift to choose the number of locations to include in your test geo for you select “Choose for me”.

GeoLift will select the number of locations to include in your test geo based on the effect size and the length of the experiment you entered in the previous steps to give you the experiment with the highest statistical significance.

Click “Determine test markets” when you are ready. GeoLift will analyze your data and provide options for various experiment configurations which you will be able to select from. GeoLift ranks the experiment options in terms of estimated bias and shows how much additional investment is needed in those geographies to achieve the effect size we simulated.


Estimated bias is the difference between the simulated causal impact of the spend change and the estimated causal impact as measured by GeoLift. In conducting our experiment we want to minimize the difference between these two so that we are most accurately estimating the true causal effect. The estimated bias is always positive. It is the absolute value between the actual effect and simulated effect.


Investment is the amount of money GeoLift calculated it will take to drive the effect size we specified, given the size of the geos and the assumptions about CPAs/ROIs.


For each of the experiment configurations, you will be able to see a graph of the expected conversions over time in the test and control geos for the period of the experiment.

You can use this information provided to select a set of test geographies that meets your investment constraints and which minimizes bias.

To get a final spend recommendation for the selected locations and a deep dive into the power at different effect sizes, click “Deep Dive with these locations.”


Step 3: Deep dive power analysis

The results of your power analysis are two testing plans at different effect levels (and different spend levels), as well as an analysis of the likelihood that your experiment results in statistically significant lift.

The two testing plans provide a high confidence plan - one that drives a high effect size and more statistically powerful and a Baseline plan - one that requires less intervention while still meeting the baseline criteria for statistical significance.

The power analysis graphs below will help you assess the recommended plans. The power analysis runs many simulations for your selected geos in order to help determine how statistically useful the results will be.


How to use the results of the power analysis?


Use the power analysis to:

  • Determine whether the experiment configuration is sufficient to detect a statistically significant incremental lift.
  • Refine your experiment design to arrive at an experiment that maximizes your chances of detecting statistically significant results.

The power curve shows the probability of detecting a statistically significant effect given the test geos, expected effect size and duration of the experiment. Statistical significance in this context means our ability to confidently conclude that ROI is not zero. It is not the same as the thing we are primarily interested in, narrowing the size of the confidence interval on the ROI, but the two are related and more power will also mean smaller confidence intervals. 80% power is the minimum we’d recommend for a baseline experiment (what we call the minimal detectable effect).

The next two graphs are helpful in understanding the kind of outcomes to expect in two different scenarios: (1) where the advertising channel has no true incrementality, and (2) where the advertising causes conversions to increase by whatever you specified as the effect size. In the example pictured below, when there is no true incrementality, GeoLift would estimate a small increase ($561) in the amount of revenue attributable to the advertising channel. The confidence interval would contain positive and negative values for the revenue, and the p-value would be 0.7, meaning we could not conclude that the experiment resulted in any meaningful change. In the example where there is true incrementality, we estimate $25k of additional revenue in the test group ($17.7k - $32.9k confidence interval), and the p-value is 0, meaning we could strongly conclude the experiment resulted in incremental lift. The graphs on the right show how we expect cumulative revenue to progress as the experiment progresses in each of these simulated scenarios.

If the results of the power analysis do not provide you with a feasible testing plan, you can go back to the experiment configuration and increase the number of geos, extend the duration of the study, or adjust your effect size and CPA/ROI assumptions.


How does power analysis work?

GeoLift uses historical data and statistical simulations to conduct power analysis. Here’s how it generally works:
Recast simulates the effects of your experiment: Based on the input parameters you provided while configuring your Recast Geolift experiment, GeoLift runs multiple simulations to estimate the probability of detecting a true causal effect of the spend change as well as the probability of making erroneous conclusions from you results.
Estimate Power: By analyzing the simulation results, Recast estimates the statistical power of the study. It determines the likelihood that the experiment configuration, given your input parameters, will detect an effect in the test geo in comparison to the control geo.



Analyze experiment

Once you have conducted a geographic experiment you can upload the results of your experiment and use the Analyze tool to estimate the incrementality of the channel you experimented with.


Step 1: Ingest your data
Simply upload your data in the same way as described in the “Ingest your data” section above. The data should be in the same format of date, location and KPI per date/location. The dates you include should cover the experimental window as well as at least three (and up to twelve) months prior to the experiment.


Step 2: Configure your analysis
To configure your analysis select the following inputs:

  • The start and end dates you’d like to analyze
  • The locations included in the test geo
  • The outcome variable type
  • The experiment type (spend increase or decrease)
  • The spend withheld or added to the test locations

The start and end dates do not have to directly align with when spend was turned off and on. Instead they should align with when you expect the effect to be observed. The analysis will provide an estimate of the amount of revenue (or number of conversions) attributable to the experiment during the time window. For example, if you think someone who is influenced by an ad in the channel might not convert for 2-3 weeks, you can try to extend the analysis window 2-3 weeks. If the analysis window is too short, you may end up undercounting the true incremental effect and making the channel look less effective. If the analysis window is too long, added noise will cause your confidence intervals to grow and reduce the statistical significance of the results. We recommend choosing an analysis window such that the majority (~95%) of conversions you expect to come from the increased spend (or conversely, not happen due to decreased spend) have been observed in the historical data. Your Recast MMM shift curves may be helpful in setting the analysis window.


Step 3: Analyze your results

The analysis looks like the following:


You will be able to see Recast’s estimate of the incremental lift and whether or not your results are statistically significant. These results can be incorporated into your Recast MMM by providing your Recast team the point estimate and confidence interval on the CPA/ROI. Recast does not require “statistical significance” in order to incorporate the experiment results (insignificant results typically just mean larger confidence intervals).

The Graphs show the difference in KPI output in the test and control geographies as well as the cumulative output over time in the test and control geographies. Using these graphs you can see the impact of your experimental spend change on the test geo compared to the control geo.


Glossary

Geo - A set of locations

Test Geo - The set of locations where we will implement a spend change for the duration of the experiment.

Synthetic Control Geo - These are weighted conversions in the control group. We use this to simulate the experiment and determine the probability of significant results.

Lift - The incremental effect of the spend change in the test geo.

Approximate CPA - this is a reasonable guess at the spend required to acquire a customer in the channel of interest. This is used to calculate the total spend required in the experiment to produce the effect size in the test geo. A higher CPA means that you will need to spend more to drive the effect size.

Experiment length - The number of days during which we will implement the spend change in the test geo.

Bias - The simulated difference between the actual incrementality and the effect estimated by the experiment analysis.