🛫 Export Your Data

CPA Models

Purpose

This document provides a walk through of the different data outputs available from Recast.

Each of the available outputs is written to a csv file and stored in an S3 bucket setup by Recast and shared with the client. These csvs will update each time the model is reran (usually weekly). The available csvs are explained in detail below.

Certain outputs may not be relevant to all models. For example, if your model does not use holiday or sale spikes, the csvs relating to spikes will be empty.

A note about model estimates

In the following files, we document many different types of estimates from the statistical model. Whenever something is estimated, the column name will include a p, which stands for percentile. p50 is the median (50th percentile) estimate, or our best estimate of the true value. p25 and p75 represent the interquartile range and are what we use for our confidence intervals throughout the dashboard.

intercept.csv

The intercept is the non-marketing driven sales (sometimes called organic), or a prediction of "what would our volume be if we turned all marketing off." It models yearly seasonality to account for typical sales cycles in the business.

Use cases:

  • Compare marketing-driven vs. non-marketing-driven sales at different periods in time

Rows:

  • One row for each day in the model's historical dataset

Columns:

  • date: The date
  • channel: always "intercept"
  • p columns: median estimate and confidence interval for intercept in the same units as your outcome variable (e.g. new subscribers)

cpa.csv

The CPA is the estimated average incremental cost per acquisition for each channel on each day. Days with no spend will have missing values. It will be impacted by how much money is spent on that channel (i.e., it incorporates the effect of saturation). If you have “demand capture” channels in your model (like affiliate or branded search), the CPA also reflects how much additional spend we expect into demand capture channels and how many additional conversions that spend will drive.

Use cases:

  • See historical performance of channels
  • Calculate the average cost per acquisition for a channel, both present day and historically

Rows:

  • One row for each day and each channel in the historical dataset

Columns:

  • date: The date
  • channel: The spend channel
  • cpa_p or total_effect_cpa_p columns: Median estimate and confidence interval for the incremental cost per acquisition in dollars.

me.csv

The ME is the estimated marginal effectiveness (or marginal cost per acquisition), or the cost to acquire the last (most expensive) customer from a given channel on a given day. Because of channel saturation, this cost will always be the same as or higher than the average customer given in the cpa.csv file. If you have “demand capture” channels in your model (like affiliate or branded search), the MCPA also reflects how much additional spend we expect into demand capture channels and how many additional conversions that spend will drive.

Use cases:

  • Calculate the cost of obtaining one additional customer for each channel, at the level of spend on that day

Rows:

  • One row for each day and each channel in the historical dataset

Columns:

  • date: The date
  • channel: The spend channel
  • mcpa_p columns: Median estimate and confidence interval of the incremental marginal cost per acquisition in dollars.

in_period_effect.csv

The in-period effect is the estimated number of customer acquisitions attributable to a given channel on that day. It takes account of the time shift, so if you spent a lot of money in a channel yesterday, but none today, the in-period effect will be greater than zero to represent the conversions that were driven from spend yesterday but didn't happen until today.

Use cases:

  • Assign credit to a channel for driving a certain number of conversions in a time frame

Rows:

  • One row per day, per channel

Columns:

  • date: The date
  • channel: The marketing channel
  • in_period_effect_p: The median and confidence interval for the number of acquisitions attributable to that channel on that day

cumulative_shift_curves.csv

The cumulative shift curves summarize how long it takes to realize the return from money spent on a given day. For each channel we estimate how much return has been realized within a given number of days of spending the money. These estimates are a percentage of the total realization, so will always be at or below 100%.

Use cases:

  • Estimate how long you will need to wait to see signal from spend in a given channel

Rows:

  • For each channel,the number of days after the original spend (so days=3 indicates what percent of sales are realized on or before the third day after the money was sent).

Columns:

  • channel_name: The channel
  • days: Number of days after original spend
  • p columns: Median estimate and confidence interval for the total percentage realized by that point in time

Note: Our default is to include 180 days out.

shift_curves.csv

The shift curves summarize how much return we expect to realize on a given day after the money has been spent into a channel. For a given channel, these numbers represent the % of the total realized gains that we will see on a particular day.

Use cases:

  • Calculate the additional sales you can expect x days after spending into a channel.

Rows:

  • For each channel, the number of days after the original spend (so days=3 indicates what percent of sales are realized on the third day after the money is spent).

Columns:

  • channel_name: The channel
  • days: Number of days after original spend
  • average_effect: the percentage of sales we expect to be realized that day if the return on investment happens about as fast as we expect
  • slow_roi_effect: the percentage of sales we expect to be realized that day if the return on investment is on the slow end of what we expect
  • fast_roi_effect: the percentage of sales we expect to be realized that day if the return on investment is on the fast end of what we expect

Note: We include days until the return realized on a given day is below 0.1%

predicted.csv

The predicted value for the outcome variable from the model for each day in the historical dataset

Use cases:

  • Compare the model's fit to what actually happened on a given day

Rows:

  • Each day in the historical dataset

Columns:

  • date: The date
  • p columns: the median and confidence interval for our prediction on a given day

clean_data.csv

The data fed into the model, including the outcome variable and the channel spend variables

Use cases:

  • Compare what data was used in the model to external sources for validation

Rows:

  • Each day in the historical dataset

Columns:

  • date: The date
  • Outcome Variable: the name of your specific outcome variable
  • Channel names: The names of your specific marketing channels

spike_summaries.csv

A summary of the estimated effect of a spike in the model. Spikes can be anything that has a large effect on sales, typically a sale or a holiday that effects normal business operations. Spikes can have a positive component by increasing sales, and a negative component by cannibalizing sales both before and after the spike. The spike summary can be thought of as the difference between what did happen and what would have happened if the special event never happened, but your marketing spend stayed the same.

Use cases:

  • See whether a sales event positively contributed to revenue

Rows:

  • One row for each spike in the model

Columns:

  • date: The "central" day of the spike, typically the day where sales jumped the most.
  • spike_group: The numerical group the spike belongs to
  • spike_group_name: The name of the spike group
  • p columns: the median and confidence interval for the effect of the spike on the outcome variable (in the same units as the outcome variable).

spike_group_summaries.csv

A summary of the estimated average effect for a group of spikes in the model. Spikes can be grouped if we expect multiple spikes to have similar behavior (for example, reoccurring 20% off sales). The summary indicates what effect an average spike in this group will have on the outcome variable, holding marketing spend constant.

Use cases:

  • Rank different types of sales events by their impact on sales

Rows:

  • Each spike group

Columns:

  • spike_group: An ID for that group
  • spike_group_name: The name of that spike group
  • mean column: The mean estimate for the average effect of that group on the outcome variable
  • p columns: A median estimate and confidence interval for the average effect of that group on the outcome variable
  • dates: the dates of the spikes in that spike group

spike_series.csv

The spike series describes how the effect of spikes happen over time. Each spike can have an effect leading up to the spike and in the days after the spike. This dataset shows whether that effect was positive, negative, or zero for each day.

Use cases:

  • Visualize the effect of a spike event

Rows:

  • One row for each day, for each spike

Columns:

  • spike_date: The date of the spike that the row is summarizing
  • spike_group: The numerical group the spike belongs to
  • spike_group_name: The name of the spike group
  • date: The date we're estimating the effect for
  • daily_p columns: The effect (median + confidence interval) on the outcome variable for that spike on that day
  • cumulative_p columns: The cumulative effect of the spike for all days up to and including that day

response_impact.csv

An estimate of the impact on the outcome variable if you spend a certain amount of money in a certain channel. This is an estimate at the current (non-historical) channel performance only and summarizes how many new customers you will obtain for a given level of spend. It demonstrates the effect of saturation, so as spending increases the estimated response's growth will slow. The impact is how many new acquisitions you will receive in total, not on the first day of spend or any specific number of days of spend.

Use cases:

  • Plan how you can scale a channel to achieve marketing goals

Rows:

  • One row for each channel, at each spend level. Each channel has 100 rows, with spend levels chosen so that they're in the same range as your historical spend.

Columns:

  • channel: The channel
  • spend: An amount of daily spend that could be spent in the channel
  • p columns: The median and confidence interval on the number of acquisitions you can expect from that spend.

response_cpa.csv

An estimate of the CPA if you spend a certain amount of money in a certain channel. This is an estimate at the current (non-historical) channel performance only, and summarizes how much your average CPA will be for different levels of spend. It demonstrates the effect of saturation, so as spending increases the estimated CPA will increase.

Use cases:

  • Identify how much you can scale a channel while maintaining a certain CPA

Rows:

  • One row for each channel, at each spend level. Each channel has 100 rows, with spend levels chosen so that they're in the same range as your historical spend.

Columns:

  • channel: The channel
  • spend: An amount of daily spend that could be spent in the channel
  • p columns: The median and confidence interval of the CPA you can expect at that spend level

response_mcpa.csv

An estimate of the marginal CPA if you spend a certain amount of money in a certain channel. This is an estimate at the current (non-historical) channel performance only, and summarizes how much your marginal CPA will be to acquire the last customer at a given level of spend. It demonstrates the effect of saturation, so as spending increases the estimated marginal CPA will increase.

Use cases:

  • Identify which channels can be scaled to acquire more customers at the lowest price.

Rows:

  • One row for each channel, at each spend level. Each channel has 100 rows, with spend levels chosen so that they're in the same range as your historical spend.

Columns:

  • channel: The channel
  • spend: An amount of daily spend that could be spent in the channel
  • p columns: The median and confidence interval of the marginal CPA you can expect at that spend level

marginal_effectiveness.csv

This file is a little different from the other files in that it’s not raw data from the model, but rather a helpful summary of your current state. To increase return on marketing investment, you want to shift money from high marginal CPA channels to those with low marginal CPA. This sheet ranks your currently active channels to highlight which ones you should be moving money into.

Use cases:

  • Identify which channels should be receiving a larger investment

Rows

  • One row for each channel that is currently active

Columns

  • channel: The channel
  • mcpa_p columns: The median and confidence interval of the average marginal CPA over the last 30 days.
  • last_30_spend: How much money you’ve spent in that channel in the last 30 days
  • better_than_median: A flag for whether that channel is better than your average channel
  • upper_funnel_channel: A flag for whether it’s an upper funnel or lower funnel channel (Note: if you do not have lower funnel channels configured in your model, all channels will be upper funnel channels)

ROI Models

Purpose

This document provides a walk through of the different data outputs available from Recast.

Each of the available outputs is written to a csv file and stored in an S3 bucket setup by Recast and shared with the client. These csvs will update each time the model is re-run (usually weekly). The available csvs are explained in detail below.

Certain outputs may not be relevant to all models. For example, if your model does not use holiday or sale spikes, the csvs relating to spikes will be empty.

A note about model estimates

In the following files, we document many different types of estimates from the statistical model. Whenever something is estimated, the column name will include a p, which stands for percentile. p50 is the median (50th percentile) estimate, or our best estimate of the true value. p25 and p75 represent the interquartile range and are what we use for our confidence intervals throughout the dashboard.

intercept.csv

The intercept is the non-marketing driven sales (sometimes called organic), or a prediction of "what would our volume be if we turned all marketing off." It models yearly seasonality to account for typical sales cycles in the business.

Use cases:

  • Compare marketing-driven vs. non-marketing-driven sales at different periods in time

Rows:

  • One row for each day in the model's historical dataset

Columns:

  • date: The date
  • channel: always "intercept"
  • p columns: median estimate and confidence interval for the intercept in the same units as your outcome variable (e.g. new subscribers)

roi.csv

ROI is the estimated average incremental return on investment for each channel on each day. Days with no spend will have missing values. It will be impacted by how much money is spent on that channel (i.e., it incorporates the effect of saturation).

Use cases:

  • See historical performance of channels
  • Calculate the ROI for a channel, both present day and historically

Rows:

  • One row for each day and each channel in the historical dataset

Columns:

  • date: The date
  • channel: The spend channel
  • roi_p columns: Median estimate and confidence interval for the incremental return on investment in dollars.

me.csv

The ME is the estimated marginal effectiveness, or the return on the last dollar spent in a given channel on a given day. Because of channel saturation, this return will always be the same as or lower than the average ROI given in the roi.csv file.

Use cases:

  • Calculate the return for an extra dollar spent, given a certain amount of saturation

Rows:

  • One row for each day and each channel in the historical dataset

Columns:

  • date: The date
  • channel: The spend channel
  • mroi_p columns: Median estimate and confidence interval the incremental marginal return on investment in dollars.

in_period_effect.csv

The in-period effect is the estimated amount of revenue attributable to a given channel on that day. It takes account of the time shift, so if you spent a lot of money in a channel yesterday, but none today, the in-period effect will be greater than zero to represent the revenue that was driven from spend yesterday but didn't happen until today.

Use cases:

  • Assign credit to a channel for driving a certain amount of revenue in a time frame

Rows:

  • One row per day, per channel

Columns:

  • date: The date
  • channel: The marketing channel
  • in_period_effect_p: The median and confidence interval for the amount of revenue attributable to that channel on that day

cumulative_shift_curves.csv

The cumulative shift curves summarize how long it takes to realize the return from money spent on a given day. For each channel we estimate how much return has been realized within a given number of days of spending the money. These estimates are a percentage of the total realization, so will always be at or below 100%.

Use cases:

  • Estimate how long you will need to wait to see signal from spend in a given channel

Rows:

  • For each channel, a certain number of days after the original spend (so days=3 indicates what percent of sales are realized on or before the third day after the money was sent).

Columns:

  • channel_name: The channel
  • days: Number of days after original spend
  • p columns: Median estimate and confidence interval for the total percentage realized by that point in time

Note: Our default is to include 180 days out.

shift_curves.csv

The shift curves summarize how much return we expect to realize on a given day after the money has been spent into a channel. For a given channel, these numbers represent the % of the total realized gains that we will see on a particular day.

Use cases:

  • Calculate the additional sales you can expect x days after spending into a channel.

Rows:

  • For each channel, the number of days after the original spend (so days=3 indicates what percent of sales are realized on the third day after the money is spent).

Columns:

  • channel_name: The channel
  • days: Number of days after original spend
  • average_effect: the percentage of sales we expect to be realized that day if the return on investment happens about as fast as we expect
  • slow_roi_effect: the percentage of sales we expect to be realized that day if the return on investment is on the slow end of what we expect
  • fast_roi_effect: the percentage of sales we expect to be realized that day if the return on investment is on the fast end of what we expect

Note: We include days until the return realized on a given day is below 0.1%.

predicted.csv

The predicted value for the outcome variable from the model for each day in the historical dataset

Use cases:

  • Compare the model's fit to what actually happened on a given day

Rows:

  • Each day in the historical dataset

Columns:

  • date: The date
  • p columns: the median and confidence interval for our prediction on a given day

clean_data.csv

The data fed into the model, including the outcome variable and the channel spend variables

Use cases:

  • Compare what data was used in the model to external sources for validation

Rows:

  • Each day in the historical dataset

Columns:

  • date: The date
  • Outcome Variable: the name of your specific outcome variable
  • Channel names: The names of your specific marketing channels

spike_summaries.csv

A summary of the estimated effect of a spike in the model. Spikes can be anything that has a large effect on sales, typically a sale or a holiday that effects normal business operations. Spikes can have a positive component by increasing sales, and a negative component by cannibalizing sales both before and after the spike. The spike summary can be thought of as the difference between what did happen and what would have happened if the special event never happened, but your marketing spend stayed the same.

Use cases:

  • See whether a sales event positively contributed to revenue

Rows:

  • One row for each spike in the model

Columns:

  • date: The "central" day of the spike, typically the day where sales jumped the most.
  • spike_group: The numerical group the spike belongs to
  • spike_group_name: The name of the spike group
  • p columns: the median and confidence interval for the effect of the spike on the outcome variable (in the same units as the outcome variable).

spike_group_summaries.csv

A summary of the estimated average effect for a group of spikes in the model. Spikes can be grouped if we expect multiple spikes to have similar behavior (for example, reoccurring 20% off sales). The summary indicates what effect an average spike in this group will have on the outcome variable, holding marketing spend constant.

Use cases:

  • Rank different types of sales events by their impact on sales

Rows:

  • Each spike group

Columns:

  • spike_group: An ID for that group
  • spike_group_name: The name of that spike group
  • mean column: The mean estimate for the average effect of that group on the outcome variable
  • p columns: A median estimate and confidence interval for the average effect of that group on the outcome variable
  • dates: the dates of the spikes in that spike group

spike_series.csv

The spike series describes how the effect of spikes happen over time. Each spike can have an effect leading up to the spike and in the days after the spike. This dataset shows whether that effect was positive, negative, or zero for each day.

Use cases:

  • Visualize the effect of a spike event

Rows:

  • One row for each day, for each spike

Columns:

  • spike_date: The date of the spike that the row is summarizing
  • spike_group: The numerical group the spike belongs to
  • spike_group_name: The name of the spike group
  • date: The date we're estimating the effect for
  • daily_p columns: The effect (median + confidence interval) on the outcome variable for that spike on that day
  • cumulative_p columns: The cumulative effect of the spike for all days up to and including that day

response_impact.csv

An estimate of the impact on the outcome variable if you spend a certain amount of money in a certain channel. This is an estimate at the current (non-historical) channel performance only and summarizes how much new revenue you will obtain for a given level of spend. It demonstrates the effect of saturation, so as spending increases the estimated response's growth will slow. The impact is how much return you will receive in total, not on the first day of spend or any specific number of days of spend.

Use cases:

  • Plan how you can scale a channel to achieve marketing goals

Rows:

  • One row for each channel, at each spend level. Each channel has 100 rows, with spend levels chosen so that they're in the same range as your historical spend.

Columns:

  • channel: The channel
  • spend: An amount of daily spend that could be spent in the channel
  • p columns: The median and confidence interval of the return you can expect from that spend.

response_roi.csv

An estimate of the average ROI if you spend a certain amount of money in a certain channel. This is an estimate at the current (non-historical) channel performance only, and summarizes how much your average ROI will be for different levels of spend. It demonstrates the effect of saturation, so as spending increases the estimated ROI will decrease.

Use cases:

  • Identify how much you can scale a channel while maintaining a certain ROI

Rows:

  • One row for each channel, at each spend level. Each channel has 100 rows, with spend levels chosen so that they're in the same range as your historical spend.

Columns:

  • channel: The channel
  • spend: An amount of daily spend that could be spent in the channel
  • p columns: The median and confidence interval of the ROI you can expect at that spend level

response_mroi.csv

An estimate of the marginal ROI (effectiveness of the last dollar spent) if you spend a certain amount of money in a certain channel. This is an estimate at the current (non-historical) channel performance only, and summarizes the return on investment for your last dollar spent. It demonstrates the effect of saturation, so as spending increases the estimated marginal ROI will decrease.

Use cases:

  • Identify which channels can be scaled to grow revenue at the lowest price.

Rows:

  • One row for each channel, at each spend level. Each channel has 100 rows, with spend levels chosen so that they're in the same range as your historical spend.

Columns:

  • channel: The channel
  • spend: An amount of daily spend that could be spent in the channel
  • p columns: The median and confidence interval of the marginal ROI you can expect at that spend level

marginal_effectiveness.csv

This file is a little different from the other files in that it’s not raw data from the model, but rather a helpful summary of your current state. To increase return on marketing investment, you want to shift money from low marginal ROI channels to those with high marginal ROI. This sheet ranks your currently active channels to highlight which ones you should be moving money into.

Use cases:

  • Identify which channels should be receiving a larger investment

Rows

  • One row for each channel that is currently active

Columns

  • channel: The channel
  • me_p columns: The median and confidence interval on the average marginal ROI over the last 30 days.
  • last_30_spend: How much money you’ve spent in that channel in the last 30 days
  • better_than_median: A flag for whether that channel is better than your average channel
  • upper_funnel_channel: A flag for whether it’s an upper funnel or lower funnel channel (Note: if you do not have lower funnel channels, all channels will be upper funnel channels)