Skip to main content
Skip table of contents

šŸ“Š Data Guide


A useful model all comes down to the quality of data it is trained on. Learn how to make sure we are feeding your model the best data possible.

Data Transfer

Recast clients have a number of different options for exchange of data with Recast. The data exchange options listed below are already fully supported and can be implemented for new clients very quickly.

Recast Owned Data Sources

AWS S3
SFTP

Client Owned Data Sources

AWS S3
SFTP
Google Sheets
Google Cloud Storage
Google BigQuery
Google Drive
Snowflake
Postgres (including Redshift)

Visit Recast Data Sources for more information on your options for data exchange with Recast.

Data format

For a Recast MMM, what we need to ingest is a pretty simple table of historical marketing activity ā€“ ideally going back about two and a half years.

It looks like this:

  • One row per day

  • One column for each marketing channel with:

    • the amount of spend in that channel on that day

    • the businessā€™s KPI on that day.

Make sure you add all your channels to each column: branded search on Google, non brand search on Google, Google shopping, etc. Youā€™ll need the amount of spend daily on each channel.

The business KPI can be revenue by day, profit by day, conversions by day, marketing qualified leads by day, or whatever metric the business is goaling the marketing against.

Long format is also acceptable. The only caveat we have for sharing your data in long format is including a single column containing the desired channel naming conventions- ā€˜Recast Categoryā€™. This should contain all unique combinations of dates and ā€˜Recast Categoryā€™ historically.

āž”ļø Template

Data Warehouses

We truly recommend that every brand has all its data in a marketing data warehouse. Youā€™re going to need it for reporting and for multiple analyses that youā€™re going to want to do no matter what. We think itā€™s a worthwhile investment to get it set up ā€“ whether you work with Recast or even if you donā€™t do MMM.

Upload Cadence

We have a weekly upload schedule. We recommend sharing 27 months - and this will need to be re-shared every week. We recommend re-uploading the entirety of the dataset each week to capture any corrections to marketing spend.

Different types of data Recast requires

Data Sets Overview

Data Set

Includes

Daily marketing spend and business KPIs

Required

One row per day of your marketing activity in every channel, with channels as columns and rows as days, with an additional column for your dependent variable (typically sales or new customer acquisitions)

Promotional calendar

Required if you run promotions, have large product launches, or do other big non-spend marketing activity. This should also include product launches and other non-pricing promotions

A list of promotions you have run, their dates, and any helpful metadata (e.g. was it 25% off sale, a buy-one-share-one, elon musk flamethrower launch, went on shark tank, etc.)

Incrementality tests

Optional. Useful if you have run incrementality tests and would like to ā€œpinā€ Recast to those estimates

A list of incrementality tests you have run, including the channel, the dates of the test, the type of test run, and the results

Setting up the data feed

For the initial model run, it is okay to have an ā€œad hocā€ dataset compiled manually. In order to set up subsequent refreshes of the model, Recast will need to have programmatic access to the daily marketing data.

Typically Recast clients transfer data to Recast in one of three ways:

  1. A shared S3 bucket. This is Recastā€™s preferred approach. We can set up a shared S3 bucket, and you can drop CSV files on a regular basis. To get started, we just need your AWS account number. Once we have that, we will set up the bucket and provide you with instructions to access the bucket.

  2. Direct access to your data store (e.g. BigQuery or Snowflake)

  3. A Google Sheet

In order to run regular refreshes, we require the following:
āœ… The access point (e.g. the Google Sheet URL) does not change from week to week
āœ… The schema of the data does not change from week to week
āœ… We need to know (or be able to determine from the data) when the data is complete. For example, it is often the case that some channels are updated more quickly than others. In this case, either the data needs to disambiguate missing values from zeroes, or we need to be given a rule (e.g. ā€œalways use the data up until three days before the refresh dateā€)
āœ… We need a day-of-week and a time-of-day to refresh the data and to kick off the model

Daily Marketing Spend

Recast requires a dataset in the format laid out below. It has:

  • One row per day

  • One column per marketing channel (below, Facebook and Podcast)

  • One column for the dependent variable (typically revenue or new customer acquisition)

We are able to accept datasets in a ā€œlongā€ or ā€œwideā€ formats, or to do merging across multiple files. The key for Recast is to have a consistent data format that we can easily transform into the schema below

Example Data

Date

Revenue

Facebook

Podcast

1/1/2024

$1,000,000

$30

$1000

1/2/2024

$500,000

$30

$0

1/3/2024

$750,000

$30

$0

Notes on different data types

Channels that have ā€œflightsā€ or ā€œdrop datesā€, like podcast, direct mail, etc.

We would prefer to have the spend on the day the ads begin to be distributed, e.g. the date of the podcast, the first ā€œin-homeā€ date of the direct mail drop, and so on. This is in contrast to pre-spreading the spend out over the period of the campaign. Recastā€™s time-shift curve accounts for the delay between the spend on the advertising and when it is received by consumers.
For long-term contracts (e.g. with influencer partners), weā€™d like to have the spend tied to each specific promotion. E.g. if youā€™re doing 3 sponsored posts, weā€™d like to have three spend entries in the column representing influencer spend, one for each post.

Affiliate spend

Depending on how your affiliate program is set up, Recast may be able to handle it like any other spend channel, or we may have to handle it in a different way. The crucial question is: do you pay your affiliate partner on a ā€œcommission basisā€ (a specified rate per conversion or sale), or do you pay upfront?
If you pay per conversion, then your spend in the affiliate channel is directly caused by conversions or revenue, rather than the other way around. To handle this, we have two options:

  1. Recast can subtract the affiliate spend from your target variable, to account for the fact that some channels will drive more conversions/revenue through affiliates than others

  2. We can include other variables (e.g. the number of active affiliates) to represent the size and intensity of the affiliate program

Non-spend channels (like email)

While Recast can handle these channels, our current recommendation is not to include them, as they are typically run very differently than the other channels

Branded search

Recast currently handles branded search like any other channel and we control the impact that branded search channels have on the model by constraining their incrementality to reasonable levels. An in-development version of the Recast model will handle these channels more explicitly as ā€œoutcomesā€ resulting from spend in other channels.

Naming/schema changes

(warning) When the names of channels change, e.g. from facebook_prospecting to facebook_prospecting_spend, our updating scripts will typically break. Please give us as much notice as possible for naming changes. These changes will typically require a week to incorporate and may delay model refreshes. When the schema of the input data changes it requires Recast to re-write the ingestion code; this will result in delays to the model refresh.

Changes to past data

(warning) When Recast refreshes the model each week, it doesnā€™t just run the last week of data, but it re-estimates the entire model for all of history. This means that changes to past data can affect current estimates. This can result in larger-than-expected changes to your results.

Adding a channel

It is common to add new channels for testing purposes. When new channels are added, a few things need to happen on our end in order to ensure that the channel is included in the refresh run.

(warning) Please give us a one week notice for any additional channels that are being added to the model. We require the exact naming format for the channel ahead of time in order to be able to ensure that the model refreshes correctly.

Promotional Calendar

Since we include promotions in our forecasts, we would like to have your promotional calendar as far out as you can produce it (up to a year from the current date)

Name

Start Date

End Date

Mother's Day Promotion

May 3 2023

May 11 2023

Spring Sale

March 2 2023

March 18 2023

New Years Eve

December 29 2023

January 2 2024

Incrementality Tests

Recast is able to ā€œpinā€ the estimates for the incrementality of any channel to the results of incrementality tests

Channel Name

Start Date

End Date

Experiment Type

Incrementality Estimate

Confidence Level

Facebook

May 3 2023

May 11 2023

In-Platform

1.5x

+/-0.1

Youtube

March 2 2023

March 19 2023

Geo Holdout

2x

+/-0.3

Google Branded Search

January 5 2023

January 26 2023

Blackout

0.7x

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.