What’s the best way to measure the success of your in-app marketing campaigns when you don’t have an SDK or real time control group?

New in-app marketing strategies are meaningless if you can’t measure the results they produce. In an ideal scenario, marketers conduct controlled experiments that define the causal relationship between specific strategies and desired outcomes. Generally, this is accomplished by using A/B testing framework with a control group. But that level of precise experimentation isn’t always possible. Without an SDK, or A/B testing framework, marketers are forced to draw conclusions about cause and effect based on observational data. Even though it isn’t randomized, it’s entirely possible for marketers to use that observational data to develop a detailed understanding of their marketing impact. All it takes is the right techniques and methodologies for the data you have available.

At wappier, creating performance benchmark methodology for Global Pricing was a practical matter. While our customers like that Global Pricing is a no-SDK solution because it requires no engineering effort, they lacked a real-time control group for measuring campaign results. Although most game developers prefer classic A/B testing on a per country basis, they lack the capabilities to do that kind of in-region testing by serving the same pricing to users within a single country. We developed our performance benchmark methodology to give developers the opportunity to take advantage of per country pricing without an SDK, while still gaining valuable insights into per-country performance. Let’s look at how the methodology works, starting with the differences between SDK and non-SDK measurement models.

The gold standard: Real-time, randomized controlled experiments

With an SDK in place, conducting randomized controlled experiments is the best way to measure success. Using this methodology, marketers can compare a test group of users exposed to the marketing intervention in question to a control group that was not exposed. This is the most direct and conclusive way to develop a contextualized understanding of the specific results that any one marketing initiative provides. By isolating a single variable (like a new pricing strategy, for example) and then comparing it to the control group, marketers can determine the true incremental value of that marketing strategy.

Although they are considered the gold standard, randomized controlled experiments are not a holy grail. Setting up an experiment requires up-front investment of time and effort, in addition to the opportunity cost of leaving the control group out of the marketing strategy you’re testing and subsequent realized benefits. Meanwhile, outliers that land well above or below the norm of the experiment can skew the data and make comparisons problematic. It’s typically a good idea to select a median statistical measure or exclude outliers entirely in order to ensure that the results you are seeing accurately represent the actions you are trying to measure. Even though the quality of the results derived from randomized experiments may vary, they are still the most accurate way to measure incrementality.

Plan B: Using observational data techniques to understand results

SDKs allow marketers to deliver custom app experiences to different groups of users, distinguishing between test and control groups and collecting analytics data about each group’s performance. Without an SDK, marketers must focus on making broader changes, like incorporating geo-targeted pricing where you want all users in the same country to have the same price. In that case, direct comparisons derived from specific experiments aren’t possible, so observational data becomes the primary tool in assessing marketing results. The nature of observational data means marketers are dealing with outcomes that have already taken place without any control group to compare them to. The lack of randomized data challenges the potential conclusiveness of the results.

In order to derive the most accurate conclusions possible from observational data, marketers should make sure to:

  • Isolate a single marketing strategy as the variable being tested
  • Gather enough data for a sample size that accurately represents the test group
  • Check confidence levels, statistical significance, and error margins
  • Ensure comparability between the simulated test and control cohorts
  • Identify and exclude outliers, fake users, etc.

Even though these techniques can’t inspire the same level of confidence as randomized experiments, they are still the most powerful Plan B approach. What’s more, when using the same treatment for all users is a priority, the benefits of this strategy outweigh the decrease in accuracy. In this case, marketers must synthesize experiments, create benchmarks, and run counterfactual simulations to gain contextual understanding out of observational data. These steps allow for a comparison between the simulated performance of the marketing initiative and the estimated impact that would have taken place without it. Here are a few examples of these techniques in action:

Creating a benchmark control group

One way to use observational data to measure marketing results is by creating a benchmark around a control cohort. Using Average Revenue Per User (ARPU) as the primary KPI, marketers can then measure the relative growth of target cohorts against the results in the control group. For example at wappier, we might compare the relative ARPU growth of countries on the platform where we have optimized pricing with the same countries on the platform we didn’t optimize. This horizontal methodology ensures that we’re comparing apples to apples, and instills confidence that any differences in the ARPU growth between the two platform groups is attributed to our program.

Evidence-based causal impact models

Marketers that require more evidence to support their observational data conclusions should consider Causal Impact, a Bayesian structural model created by Google. Causal Impact is designed to estimate the impact of an individual ad campaign on overall sales lift. The wappier team typically uses Average Revenue Per Daily Active User (ARPDAU) as the primary KPI in this case because it provides the model with even more data points. Where the benchmark control method allows us to compare platform performance, Causal Impact allows us to assess impact from country to country. In that way, marketers can assume that any differences between actual ARPDAU rates and those predicted by the Causal Impact model can be attributed to the marketing strategy they’re testing.

Measuring marketing results is a matter of finding the right tool for the job

As marketing strategies evolve, it’s important for all marketers to continuously evaluate their measurement methodologies. While academic researchers take the effectiveness of statistical and analytical measurements seriously, their findings don’t always filter into the measurement tools that marketing professionals have at their disposal. Effectiveness experts need to bridge that gap with robust and transparent open source research, communicated in language that is accessible for marketers and product managers.

This is particularly important as marketers face increasing scrutiny on their budgets and expenditures; proving their impact on the company’s bottom line requires the highest standard of evidence and analysis. Marketers won’t always be able to derive perfectly provable insights, so they should plan ahead to be pragmatic about the models and methodologies they can build from their available data. Developing an awareness of potential uncertainties in the estimated nature of observational data-based simulations will empower marketers to make decisions even without randomized controlled experiments. The good news is that the things that can go wrong with these methodologies are often predictable, which means marketers have an opportunity to plan for them accordingly.