
The case for borrowed intelligence over custom model training
By: Gavin Wills SVP, Engineering
Artificial intelligence has become a fixture in market research. From automated coding of open-ended responses to sentiment analysis, churn modelling, and predictive segmentation, researchers have embraced AI as a productivity multiplier. The efficiency gains are real, the applications are proven, and the tooling has matured significantly.
But as the industry moves into more ambitious territory, specifically, the use of synthetic panels to simulate opinion at scale, a new question has emerged. One that many teams are answering in the wrong direction.
That question is: should we train our own AI model to power our synthetic panel?
For most organisations, the answer is no. And understanding why requires being honest about where custom AI training genuinely shines – and where it quietly fails.
AI in Market Research: The Legitimate Use Cases
Let’s start with what works. AI has earned its place in research operations across several well-defined problem types.
- Natural language processing has transformed how teams handle open-ended survey responses, turning thousands of verbatims into structured themes in minutes rather than days.
- Predictive models help identify which customers are likely to churn, which segments are most valuable, and which product features drive satisfaction.
- Classification models can flag survey fraud, detect speeders and straight-liners, and filter low-quality responses before they contaminate your data.
In these scenarios, custom model training makes absolute sense. You have proprietary data, a well-defined and stable problem, and specific domain characteristics that generic models won’t capture. Training your own fraud detection model on your platform’s historical data? Smart. Building a churn classifier tuned to your specific customer behaviour patterns? Entirely justified.
The logic holds right up until you try to apply it to synthetic panels.
The Synthetic Panel Promise and the Temptation to Build
Synthetic panels are a genuinely exciting development in market research. The concept is straightforward: you populate a panel with AI-generated personas, statistically grounded, demographically diverse, behaviourally consistent – and survey them directly. At scale, this means hundreds or thousands of simulated responses at a fraction of traditional research costs, and with a speed and flexibility that fundamentally changes what’s possible.
When research teams first encounter this idea, many land on an instinctive response: we’ll train our own model. We have years of survey data. We know our respondent pool. We can fine-tune something bespoke.
It sounds reasonable. It’s actually the wrong problem to solve.
Why Training Your Own Model Undermines the Whole Point
The appeal of custom training is familiarity, you’re working with your data, your respondent profiles, your historical responses. But this is precisely the problem.
A model trained on your existing panel data can only ever reflect what those respondents have already told you. It learns the patterns of past surveys, past topics, past sentiment distributions. It cannot extrapolate genuine human nuance (cultural shifts, emerging attitudes, demographic complexity) because it has never encountered most of it. You’re not simulating human opinion. You’re recirculating your own historical data with a generative wrapper around it.
The failure modes compound quickly. Overfitting to historical respondents means your personas start to look homogeneous – a flattened, averaged version of who you’ve surveyed before. Demographic nuance collapses. Edge cases disappear. The model hallucinates a false consensus that mirrors your existing data rather than the messy, contradictory reality of actual human opinion.
And then there’s the maintenance burden. Consumer attitudes shift. Cultural contexts evolve. A custom-trained model demands continuous retraining to stay relevant, a significant ongoing investment in data, compute, and ML expertise that most research organisations are simply not set up to sustain.
“Your model can only know what it’s already seen. For synthetic panels, that’s not a feature – it’s a fundamental flaw.”
The Digital Twin Problem: Who Owns the Person?
Many custom synthetic respondent approaches lean on digital twins: AI representations built from the data of real, identifiable people. It sounds like a rich training signal. In practice, it creates serious problems.
Did your panel members consent to having their responses used to train an AI that will simulate their opinions indefinitely? In most cases, no. And anonymising the data doesn’t solve it. A model trained on anonymised responses still encodes the behavioural patterns and attitudinal fingerprints of real people, re-identification risk doesn’t vanish because a name has been removed. Regulators are increasingly clear that using data collected for one purpose to train an AI for another is a separate act requiring separate justification.
There are commercial risks too. Panel providers own their respondent relationships. Using their data to reduce your dependence on them sits in a contractually grey area that could damage supplier relationships or see data access withdrawn entirely.
The risk doesn’t stop at your own data practices either. Organisations running surveys through major platforms should be asking hard questions about how their data is being used upstream. There are growing concerns and, in some cases, emerging legal scrutiny around platforms training proprietary AI models on aggregated client survey data. The research your organisation commissioned, the questions you designed, the responses you paid to collect, all of it potentially feeding a model that a competitor using the same platform will also benefit from. It’s a structural conflict of interest that the industry hasn’t fully reckoned with yet, but it’s coming.
Fabricated personas sidestep all of this. Not derived from any real individual, there’s no consent question, no re-identification risk, no ownership dispute. As clients begin asking harder questions about data provenance, that’s not a minor detail. It’s a foundational advantage.
“Fabricated personas are not traceable to any real person. They are, by design, nobody’s data – and that’s a significant competitive and ethical advantage.”
The Case for Borrowed Intelligence
Here’s the reframe: you don’t need to build the intelligence. You need to borrow it.
The world’s leading AI labs (Anthropic, OpenAI, Google DeepMind) have spent billions of dollars and years of research developing frontier language models of extraordinary capability. These models have been trained on more human expression, cultural context, linguistic diversity, and nuanced reasoning than any proprietary training run could ever replicate. They understand irony, regional dialect, generational difference, and social tension. They can inhabit a persona consistently across a survey instrument, respond to hypotheticals with genuine reasoning, and produce outputs that reflect the full complexity of human thought.
When your synthetic panel runs on frontier models rather than a bespoke fine-tuned system trained on your own limited dataset you’re not just running a survey, you’re borrowing the most sophisticated representation of human language and cognition ever built. And crucially, you’re doing it without the infrastructure overhead, the maintenance cost, or the ceiling effect of a model frozen at the moment of its last training run.
The fabricated personas in your panel become the framework: the demographic and psychographic scaffolding that structures who is being asked and how they’re likely to think. The borrowed intelligence is what makes those personas actually feel human. And because frontier models are continuously updated and improved by the organisations investing most heavily in AI development, your panel gets smarter over time without any additional effort on your part.
How This Compares to Other AI Approaches
It’s worth being direct about why the alternative AI-driven approaches fall short:
- Custom-trained synthetic models amplify historical bias, create legal complexity around data ownership and digital twin liability, require substantial ongoing investment, and produce personas that lack the cultural breadth and reasoning depth of frontier models. The effort-to-output ratio simply doesn’t justify the approach.
- Unstructured LLM prompting: simply asking a language model to “respond as a 35-year-old parent in New York” gives you inconsistency, no demographic grounding, and no systematic framework for analysis. It produces interesting one-off outputs. It does not produce reliable research.
A frontier-powered synthetic panel built on fabricated personas solves for both. It’s fast, scalable, legally clean, demographically structured, and animated by intelligence that no internal team could independently produce or maintain.
An Honest Word on Limitations
Synthetic panels, however well-powered, are not a wholesale replacement for all research methods. Highly novel stimuli, deep emotional responses, and the kind of spontaneous behaviour that only emerges in real human interaction still have their place. The synthetic panel is at its most powerful when used to sharpen hypotheses, stress-test assumptions, and prioritize where research attention is best directed.
The Leverage Is in the Model, Not the Training Run
Market researchers thinking seriously about synthetic panels face a genuine strategic choice. They can invest time, money, and expertise in training proprietary models on their own historical data, and inherit all the limitations, legal complexity, and maintenance burden that comes with it. Or they can recognise that the most powerful AI ever built is already available, that fabricated personas offer a clean and defensible foundation, and that the competitive advantage lies not in replicating frontier intelligence, but in deploying it intelligently.
The researchers who lead the next decade of insight won’t be the ones who tried to out-build the frontier labs. They’ll be the ones who understood the difference between the right AI problem and the wrong one, and borrowed accordingly. For a profession built on understanding people, that’s a generational leap forward.
The best intelligence in the world is already built. The only question is whether you’re smart enough to borrow it.
Ready to carefully explore Synthetic Panel? Contact us today.
Share This Post:
Insights & News
News and Perspectives for the constantly evolving market researcher.
About OpinionRoute
Learn more about the team committed to redefining survey insights delivery.
