Modeling Best practices

Exogenous Variables

Brands often raise the question “should we control for x?” where x is some exogenous variable that impacts their sales. Things like:

  • Macroeconomic factors
  • Covid
  • Changes to their website
  • Competitors pricing
  • Etc.

The answer here generally is “maybe.” First things first, we want to think about what questions we want to be able to answer and how we want to be able to interpret the model results. In general, for most of these factors we believe that they will have their impact in two different ways:

  1. Impacts on “base sales” (i.e., the intercept)
  2. Impacts on marketing effectiveness (i.e., through the effectiveness of marketing)

Because of this, “controlling for” a feature like Covid by treating it like it were any other marketing channel is going to be wrong.

For the most part, for backwards-looking inferences all of these factors will get incorporated automatically into the results via the time-varying nature of both the intercept as well as the channel effectiveness estimates. That is, if Covid had an impact on a business it probably happened by impacting base sales (which will get picked up by the time-varying intercept) and by making marketing effectiveness more or less performant (which will get picked up by the time-varying channel efficiency estimates). This will happen without explicitly controlling for Covid in the model.

Since accurate inferences don’t require explicitly controlling for these variables we generally recommend against controlling for them in the interest of model parsimony.

Contextual Variables

There are some cases where it might make sense to include these exogenous variables in the model. Particularly, we think it makes the most sense to include these variables when we have the ability to predict the value of these variables into the future. So things like “the weather” often don’t make sense to include, but if your business is highly dependent on a variable like “new housing starts” and you feel your organization has the capacity to predict this variable into the future, then it might make sense to include that variable in the model.

The tradeoff of course is that if we include the variable in the model, then we need to be able to predict the variable in order to be able to run forecasts and optimizations into the future which, after all, is one of the most important uses of Recast.

One good use case for contextual variable is for something like a price or offer change. For example, if you increase the price of your product from $5 to $7 then we can include the price change as a contextual variable in order to be able to estimate how that price increase impacted both your base sales (through the intercept) as well as your marketing performance (via the ROIs on your marketing effectiveness). Since this variable is controlled internally it’s very easy to forecast and will make our forward-looking forecasts and optimizations more accurate (since the change in performance due to the price change is important contextual information).

Non-Paid Media Variables

There are other types of marketing activity that don’t exactly look like typical paid marketing activity. These types of activity might include:

  • Sales activity like phone calls or texts or site visits from field reps
  • Organic content like videos posted to Youtube
  • SEO content
  • etc.

Whether or not it makes sense to include these variables in the model will depend on a number of factors:

  1. Do you have accurate data on this phenomenon?
    1. If you don’t have accurate data that we trust then we shouldn’t include it in the model
  2. Can you convert the phenomenon into a currency?
    1. If we can’t convert the phenomenon into a currency then making comparisons between activity-types gets a lot more difficult and it makes “optimizations” impossible.
  3. Can you forecast the phenomenon?
    1. If you can’t forecast this phenomenon then you won’t be able to use the forecaster / planning functionality in the Recast platform
  4. Are these activities happening at the same “level” of the conversion funnel as marketing activity?
    1. If the activity is downstream of marketings (like cart abandon-emails) then it probably doesn’t make sense to include it in the model since it’s not operating at the same level as marketing spend

First things first: sometimes people want to include every possible business activity in the MMM model. Things like engineering headcount or website checkout changes. In general we don’t think it’s wise to include things like this in the MMM, but rather it’s best to keep the MMM as focused as possible on measuring the performance of marketing activity and not trying to measure the performance of every possible business activity.

Otherwise, if you have variables that are part of your marketing mix and check the boxes above, then it definitely could make sense to include them in the model. Site visits by sales people and handing out samples for pharmaceutical products fall into this bucket and are good candidates for inclusion in the model.

Promotions

We generally handle promotions in our model using spikes.

See our help page Practical Considerations for Placing Spikes for advice on this.

Modeling Channels without Vary Spend Over Time

Some clients invest in tactics like sponsoring a sports team or having their brand featured on a radio show for a prolonged period of time. In a similar vein, they may also pay a PR firm a fixed fee to generate them organic press. These types of channels represent unique challenges for Recast give we rely on variation in spend over time to understand the relationship between a channel and the client’s dependent variable.

Depending on the data available from clients on for these channels, we have identified 2 options for modeling them:

Option 1: Non-Spend Channel using Impressions

If impressions generated by the fixed spend are available by day or by week historically we can model it as a non-spend channel. The benefits of this are it captures the changing impressions over time, and then also allows the efficacy of the impressions to change over time, saturate, and have a time shift. The cons of this approach are that it can’t be represented in the optimizer, so will have to be factored in outside of this and some of the reports are not available for non-spend channels.

Option 2: Non-Spend Channel Using Binary Indicator Variable

If a variable impressions metric is not available historically, utilize binary indicator variables as Non-Spend channels to indicate when there are events that would drive impressions. This requires that the channel is not always-on, so there are some days we can observe with 0s.

Option 3: 😢

If historical impressions metric is not available, and the channel is truly always on, we won’t be able to estimate the efficacy of the fixed spend channel distinctly from the intercept.

Lift Tests

See more detail on lift test theory and configuration options on the Lift Tests page