Back to Basics- Incidence in Online Surveys in 2017.
At the start of my career in Market Research, I was put through a relative sampling boot camp before I was allowed to touch a project or interact with clients. Some (cough cough) years later, I’m still drawing on nuggets of gold from that training.
Still in the phone heyday, and mostly prior to online survey popularity, the brilliant Linda Piekarski and others would put us through our paces. We were taught the details of US geography, telephone exchanges and incidence rate calculations, as the data collection call centers required. I learned the nuance between the varieties of IR formulas and how they impacted production rates. This was then later reinforced numerous times through countless projects in telephone and online data collection I’ve amassed over the years.
Given that context, it’s striking to me that we seem to operate in an era where incidence formulas and production rates are not nearly as aligned as they were back then. The buyers and sellers of online sample may often have extremely different business views of these two metrics and how to calculate them. This is striking given how vital this single metric is to determining both feasibility and CPI for online panels. I touched on this in a partner blog with Brookmark Research when we articulated how “conversion rates” serve as the more operative/relevant formula for online panel firms who tend to manage their supply programmatically, particularly in the US.
Among OR’s clients, incidence can be defined in a variety of ways. So, which is “right”? Well that depends on what guidelines you follow, and more importantly, how you interpret them on your study. Let’s review a few approaches.
As we explore available industry definitions, we tend to find some similarities in wording to what Incidence is.
From ESOMAR: Incidence (aka Strike Rate) is the proportion of respondents contacted in a survey who qualify for the survey.
From the Insights Association (Merged entity that was CASRO and MRA): Any figure referring to the percentage of people in a category.
Insights Association (a second definition in same glossary): The proportion of respondents contacted in a survey who qualify for the survey.
From SSI: Survey Incidence- This gives us information as to what proportion of participants will qualify and how much we should charge per complete for the study.
The high level themes are common. But as we start applying these definitions to practical situations, the nuances in the approaches become much more clear.
Since Incidence is a clear production metric as much as it is an analytical one, the formula behind calculating IR on a project is a given. However, the standards are not set. Here are a few examples of how these calculations can vary.
1. Survey Incidence = [the number of completes/(the number of completes + screenouts.)]
2. Incidence = # of people who qualify / (# of people who qualify + # of people who do not qualify).
3. IR= # of completes / (# of people who completed + # of people who screened-out including quota full terms)
4. IR= (# of completes + # of qualified incompletes + #Cheaters) / (# of people who completed + # of people who screened-out including quota full terms + # of qualified incompletes + # Cheaters)
The commonalities in all the above formulas are: inclusion of complete counts and terminates (screenouts). In scenario 2, the wording is less specific on the terminate category. Specifically, someone who “does not qualify”, could be broader than Terms. It could include cheaters or even early incompletes. In the third scenario, respondents who are qualified but only for a closed quota are considered against the IR. Scenario 4 tries to represent almost all survey clicks but including the qualified incompletes and Cheaters equally in the numerator and denominator of the formula.
The logical next step is to analyze on a theoretical project how this may impact the reported IR.
So, let’s run these through a typical project scenario. Let’s say an online survey project produced a disposition file that looked like this:
Demo OQs: 200
Category OQs: 300
QC Kickouts (automated): 100
Dropouts: 100 (50 in screener, 50 after)
Here’s how these align with the 5 formulas above:
1. 1000 completes / (1000 completes + 500 terms) = 67% IR
2. 1000 / (1000 + 500 terms + 100 “Cheaters”) = 62% IR
3. 1000 / (1000 + 500 terms + 500 OQs) = 50% IR
4. (1000 completes + 50 qual incompletes + 100 cheaters) / (1000 completes + 500 terms + 500 OQs + 50 qual incompletes + 100 “cheaters”) = 53% IR
If we went to a simple “conversion rate”, this formula becomes a simple:
5. Completes / Starts = 45% CR
Which is “right”?
I don’t think there is one right answer. Rather, think of this as a call to align your thinking of incidence with the partner(s) you choose. These differences in interpretation can and will have a cost impact and will drive one’s overall understanding of relative difficulty on a project. OverQuotas are a perfect demonstration. Methodological allowances (or assumed sampling capability) on targeting and/or profiling will impact project outcomes in this area.
Many legacy assumptions remain that one of the biggest benefits of double opt-in panels are the depth of profiling. But what if that profiling isn’t there like it once was? OQs are where this gap really shows up. In light of all of this and the current age of sample tech, we may be entering an era where IR becomes irrelevant. But that’s a topic of another day!
There is no one perfect approach to IR. It really depends on the sample design and approach. But this area is really one that is ripe for manipulation by either the buyer or seller of sample if it’s not deeply understood. Sometimes a price is too good to be true while other times a client spec is too good to be true.