📊 Data Guide

A useful model all comes down to the quality of data it is trained on. Learn how to make sure we are feeding your model the best data possible.

Data Transfer

Recast clients have a number of different options for exchange of data with Recast. The data exchange options listed below are already fully supported and can be implemented for new clients very quickly.

Recast Owned Data Sources

Client Owned Data Sources

Visit Recast Data Sources for more information on your options for data exchange with Recast.

Data format

For a Recast MMM, what we need to ingest is a pretty simple table of historical marketing activity – ideally going back about two and a half years.

It looks like this:

  • One row per day
  • One column for each marketing channel with:
    • the amount of spend in that channel on that day
    • the business’s KPI on that day.

Make sure you add all your channels to each column: branded search on Google, non brand search on Google, Google shopping, etc. You’ll need the amount of spend daily on each channel.

The business KPI can be revenue by day, profit by day, conversions by day, marketing qualified leads by day, or whatever metric the business is goaling the marketing against.

Long format is also acceptable. The only caveat we have for sharing your data in long format is including a single column containing the desired channel naming conventions- ‘Recast Category’. This should contain all unique combinations of dates and ‘Recast Category’ historically.

➡️ Template

Data Warehouses

We truly recommend that every brand has all its data in a marketing data warehouse. You’re going to need it for reporting and for multiple analyses that you’re going to want to do no matter what. We think it’s a worthwhile investment to get it set up – whether you work with Recast or even if you don’t do MMM.

Upload Cadence

We have a weekly upload schedule. We recommend sharing 27 months - and this will need to be re-shared every week. We recommend re-uploading the entirety of the dataset each week to capture any corrections to marketing spend.

Different types of data Recast requires

Data Sets Overview

Data SetIncludes
Daily marketing spend and business KPIsRequiredOne row per day of your marketing activity in every channel, with channels as columns and rows as days, with an additional column for your dependent variable (typically sales or new customer acquisitions)
Promotional calendarRequired if you run promotions, have large product launches, or do other big non-spend marketing activity. This should also include product launches and other non-pricing promotionsA list of promotions you have run, their dates, and any helpful metadata (e.g. was it 25% off sale, a buy-one-share-one, elon musk flamethrower launch, went on shark tank, etc.)
Incrementality testsOptional. Useful if you have run incrementality tests and would like to “pin” Recast to those estimatesA list of incrementality tests you have run, including the channel, the dates of the test, the type of test run, and the results

Setting up the data feed

🚧

For the initial model run, it is okay to have an “ad hoc” dataset compiled manually. In order to set up subsequent refreshes of the model, Recast will need to have programmatic access to the daily marketing data.

Typically Recast clients transfer data to Recast in one of three ways:

  1. A shared S3 bucket. This is Recast’s preferred approach. We can set up a shared S3 bucket, and you can drop CSV files on a regular basis. To get started, we just need your AWS account number. Once we have that, we will set up the bucket and provide you with instructions to access the bucket.
  2. Direct access to your data store (e.g. BigQuery or Snowflake)
  3. A Google Sheet

In order to run regular refreshes, we require the following:
✅ The access point (e.g. the Google Sheet URL) does not change from week to week
✅ The schema of the data does not change from week to week
✅ We need to know (or be able to determine from the data) when the data is complete. For example, it is often the case that some channels are updated more quickly than others. In this case, either the data needs to disambiguate missing values from zeroes, or we need to be given a rule (e.g. “always use the data up until three days before the refresh date”)
✅ We need a day-of-week and a time-of-day to refresh the data and to kick off the model

Daily Marketing Spend

Recast requires a dataset in the format laid out below. It has:

  • One row per day
  • One column per marketing channel (below, Facebook and Podcast)
  • One column for the dependent variable (typically revenue or new customer acquisition)

👍

We are able to accept datasets in a “long” or “wide” formats, or to do merging across multiple files. The key for Recast is to have a consistent data format that we can easily transform into the schema below

Example Data

DateRevenueFacebookPodcast
1/1/2024$1,000,000$30$1000
1/2/2024$500,000$30$0
1/3/2024$750,000$30$0

Notes on different data types

Channels that have “flights” or “drop dates”, like podcast, direct mail, etc.

We would prefer to have the spend on the day the ads begin to be distributed, e.g. the date of the podcast, the first “in-home” date of the direct mail drop, and so on. This is in contrast to pre-spreading the spend out over the period of the campaign. Recast’s time-shift curve accounts for the delay between the spend on the advertising and when it is received by consumers.
For long-term contracts (e.g. with influencer partners), we’d like to have the spend tied to each specific promotion. E.g. if you’re doing 3 sponsored posts, we’d like to have three spend entries in the column representing influencer spend, one for each post.

Affiliate spend

Depending on how your affiliate program is set up, Recast may be able to handle it like any other spend channel, or we may have to handle it in a different way. The crucial question is: do you pay your affiliate partner on a “commission basis” (a specified rate per conversion or sale), or do you pay upfront?
If you pay per conversion, then your spend in the affiliate channel is directly caused by conversions or revenue, rather than the other way around. To handle this, we have two options:

  1. Recast can subtract the affiliate spend from your target variable, to account for the fact that some channels will drive more conversions/revenue through affiliates than others
  2. We can include other variables (e.g. the number of active affiliates) to represent the size and intensity of the affiliate program

Non-spend channels (like email)

While Recast can handle these channels, our current recommendation is not to include them, as they are typically run very differently than the other channels

Branded search

Recast currently handles branded search like any other channel and we control the impact that branded search channels have on the model by constraining their incrementality to reasonable levels. An in-development version of the Recast model will handle these channels more explicitly as “outcomes” resulting from spend in other channels.

Naming/schema changes

⚠️ When the names of channels change, e.g. from facebook_prospecting to facebook_prospecting_spend, our updating scripts will typically break. Please give us as much notice as possible for naming changes. These changes will typically require a week to incorporate and may delay model refreshes. When the schema of the input data changes it requires Recast to re-write the ingestion code; this will result in delays to the model refresh.

Changes to past data

⚠️ When Recast refreshes the model each week, it doesn’t just run the last week of data, but it re-estimates the entire model for all of history. This means that changes to past data can affect current estimates. This can result in larger-than-expected changes to your results.

Adding a channel

It is common to add new channels for testing purposes. When new channels are added, a few things need to happen on our end in order to ensure that the channel is included in the refresh run.

⚠️ Please give us a one week notice for any additional channels that are being added to the model. We require the exact naming format for the channel ahead of time in order to be able to ensure that the model refreshes correctly.

Promotional Calendar

📘

Since we include promotions in our forecasts, we would like to have your promotional calendar as far out as you can produce it (up to a year from the current date)

NameStart DateEnd Date
Mother's Day PromotionMay 3 2023May 11 2023
Spring SaleMarch 2 2023March 18 2023
New Years EveDecember 29 2023January 2 2024

Incrementality Tests

📘

Recast is able to “pin” the estimates for the incrementality of any channel to the results of incrementality tests

Channel NameStart DateEnd DateExperiment TypeIncrementality EstimateConfidence Level
FacebookMay 3 2023May 11 2023In-Platform1.5x+/-0.1
YoutubeMarch 2 2023March 19 2023Geo Holdout2x+/-0.3
Google Branded SearchJanuary 5 2023January 26 2023Blackout0.7x+/-0.1

What’s Next

To learn more about how the Recast model uses your data to provide you with actionable results, visit the FAQ articles below.