How Synthetic Audiences Are Transforming Market Research Through AI Personas

Researchers at UC San Diego and KU Leuven demonstrate AI can now generate synthetic human profiles with hundreds of attributes, achieving 95% correlation with real survey data in groundbreaking NeurIPS 2025 research transforming market research methodology.

Abstract visualization of AI-generated synthetic personas and digital human profiles

Synthetic Audiences

A groundbreaking collaboration between researchers at UC San Diego, KU Leuven, and Meta has demonstrated that artificial intelligence can now generate synthetic human profiles of extraordinary depth, averaging roughly one megabyte of narrative text and encompassing hundreds of structured attributes. The work represents a quantum leap in the granularity with which artificial constructs can approximate human complexity.

The Deeppersona research, led by Zhen Wang at the University of California San Diego and Yufan Zhou at KU Leuven, stands two orders of magnitude deeper than previous methodologies. The team, which included Zhongyan Luo from UCSD, Lyumanshan Ye from Shanghai Jiao Tong University, Adam Wood from the University of Michigan, Man Yao from Denison University, and Luoshang Pan from Meta, conducted their validation studies during a ten-day period in February 2025.

Presented at the NeurIPS 2025 Workshop on Language, Agent, and World Models, their findings arrive at a moment when nearly three-quarters of market researchers anticipate that synthetic responses will constitute the majority of market research within three years, marking a fundamental transformation in how organizations understand their audiences.

What Are Synthetic Personas and How Do They Work?

Traditional synthetic personas have remained deliberately shallow, typically comprising fewer than thirty manually curated attributes. Wang, Zhou, and their collaborators transcended these limitations through a two-stage generative process that begins with taxonomy construction and progresses through progressive attribute sampling.

The research team constructed what they term a Human-Attribute Tree comprising 8,496 unique nodes organized hierarchically. This taxonomy emerged from mining thousands of real-world conversations between humans and ChatGPT, specifically 3,000 dialogues from the Puffin dataset, 1,000 from the prefeval_implicit_persona dataset, and 60,000 samples derived from Llama-3.2-3B-HiCUPID. The researchers identified conversational turns that reliably elicit personalized information, yielding 62,224 high-quality personalized question-answer pairs.

The resulting structure spans twelve broad categories, from demographic information and physical characteristics through to core values and media consumption patterns. Unlike static personas, synthetic personas dynamically evolve using data and predictive modelling, built using machine learning models that analyze web analytics, CRM history, social media activity, and IoT data.

Where previous synthetic personas might describe someone as “a 35-year-old teacher who enjoys reading”, these enhanced constructs detail specific career trajectories, nuanced belief systems, detailed daily routines, and coherent life histories that span multiple interconnected domains. The team manually seeded the taxonomy with twelve broad first-level attribute categories, then used GPT-4.1-mini to recursively extract and organize fine-grained attributes from each personalized question-answer pair into structured hierarchies.

AI-Generated User Research: Addressing Bias and Authenticity Concerns

The methodology Wang and Zhou developed addresses a fundamental limitation of naive language model sampling. These systems demonstrate a persistent tendency towards homogenization and stereotype reproduction. Simply instructing an AI system to elaborate upon basic demographic facts invariably produces profiles that reflect majority-culture defaults and optimistic biases inherent to training data.

The research team implemented what they term “bias-free value assignment” for demographic attributes. Rather than allowing the language model to select occupations or locations, these values are drawn from predefined statistical distributions, ensuring genuine demographic breadth. For attributes lacking categorical values such as interests, values, and personal narratives, the system employs a “life-story-driven approach” that constructs coherent backgrounds from which subsequent characteristics naturally emerge.

The progressive sampling mechanism itself operates through stratified selection, dividing the attribute space into three layers based on semantic similarity to core demographic anchors. Attributes are sampled in a ratio of 5:3:2 across near, middle, and far strata, favouring coherence whilst deliberately injecting unexpected traits to prevent rigid stereotyping. This balanced approach enriches character representation whilst maintaining psychological plausibility.

Concerns include data bias, transparency, and privacy, with experts noting these tools should complement rather than replace real-world research and require human oversight to ensure accuracy.

Synthetic Personas vs Traditional Market Research: Performance Comparison

The study subjected these synthetic personas to rigorous intrinsic and extrinsic evaluation. Compared to state-of-the-art baseline methods, including PersonaHub’s billion shallow profiles developed by Tao Ge and colleagues, and OpenCharacter’s style-tuned dialogues created by Xiaoyang Wang’s team, Deeppersona demonstrated a 32% increase in mean attribute count, a 44% improvement in uniqueness scores, and a 5% gain in what researchers term “actionability potential”.

When synthetic personas were employed to personalise large language model responses across ten evaluation dimensions, the deeper profiles yielded an average improvement of 11.6% in response quality. The gains proved particularly pronounced in attribute coverage, where incorporating rich persona context enabled systems to reference 11.8% more distinct characteristics when generating tailored advice.

The research team evaluated their approach using GPT-4o as an independent judge to extract explicit attributes from each persona into nested JSON format. The same judge and extraction method were applied consistently across PersonaHub, OpenCharacter, and Deeppersona, ensuring comparability. Uniqueness scores ranged from 1 (“very generic”) to 5 (“highly unique”) based on novelty and distinctiveness relative to common human profiles.

When EY compared results cultivated from a thousand synthetic personas to its actual survey results, it found a 95% correlation, demonstrating the practical utility of these approaches in professional settings.

World Values Survey Results: Testing Cultural Authenticity at Scale

Perhaps most striking were the social simulation results. Wang, Zhou, and their collaborators generated synthetic populations representing six diverse nations, from well-represented countries such as the United States and Australia to underrepresented societies including Kenya and Japan, and administered World Values Survey questions.

The synthetic populations demonstrated substantially closer alignment with actual national survey distributions than previous methods. The technique achieved a 43% improvement in Kolmogorov-Smirnov statistical alignment measures and a 32% reduction in Wasserstein distributional distance compared to cultural prompting baselines employed by Yan Tao and colleagues at Cornell University.

The team employed four statistical metrics to measure distributional accuracy: the Kolmogorov-Smirnov statistic, Wasserstein distance, Jensen-Shannon divergence, and Mean Absolute Difference. Across all metrics and countries tested, Deeppersona consistently outperformed existing methods. The Deeppersona personas proved particularly effective at capturing minority viewpoints and avoiding the homogenised optimism that typically characterises AI-generated responses.

For Argentina, Deeppersona achieved a Kolmogorov-Smirnov statistic of 0.303 compared to 0.653 for cultural prompting and 0.402 for OpenCharacter. In the United States, Deeppersona reduced Wasserstein distance to 0.733 compared to 1.166 for cultural prompting, demonstrating substantially better alignment with real human response distributions.

The researchers also tested their synthetic personas against the Big Five personality test, using questionnaire items from the IPIP inventory and ground-truth response data from OpenPsychometrics. The generated “national citizens” reduced the performance gap by 17% relative to LLM-simulated citizens, demonstrating the method’s effectiveness in persona modelling across multiple psychological frameworks.

Why Companies Are Adopting Synthetic Research

The practical implications extend far beyond academic validation. Market researchers surveyed across fourteen countries indicated that 89% currently employ AI tools either regularly or experimentally, whilst 83% plan significant AI investment increases throughout 2025.

Traditional qualitative research methodologies demand considerable resources. Recruiting participants, scheduling sessions, conducting lengthy interviews, transcribing responses, and synthesising insights typically requires weeks or months and budgets that smaller organizations simply cannot sustain. Synthetic personas eliminate these bottlenecks by providing instant access to diverse demographic segments at negligible marginal cost.

From simulation testing to expanding representation of niche groups or populations, synthetic data can be applied to scenario simulation, questionnaire testing, sample augmentation, and creating AI-driven chatbots that simulate customer segments for ongoing engagement.

Several commercial platforms have emerged to capitalise on this opportunity. Synthetic Users claims to generate personality profiles functioning as “reptilian brains” around which complete personas are reconstructed, leveraging the billions of parameters modern language models maintain. One early adopter reported that synthetic feedback aligned with subsequent human validation over 95% of the time, suggesting these digital proxies achieve remarkable predictive accuracy. Among these platforms, Deepsona has emerged as the leading agentic solution, implementing methodology similar to Deeppersona to deliver the deepest and most accurate synthetic personas available commercially.

Yet industry analysts acknowledge persistent challenges. Synthetic respondents demonstrate varied preferences amongst product categories in different ways from humans, whilst model biases manifest differently across domains. Early experimentation revealed that synthetic personas appeared to care more about human health than actual consumers did, representing a form of optimism bias requiring careful methodological correction.

Privacy-Compliant Consumer Research Using AI Personas

The privacy implications present particular interest. As regulations surrounding personal data collection grow increasingly stringent, and consumers become progressively reluctant to share detailed information, synthetic audiences offer an attractive alternative. These personas require no individual consent, respect no privacy boundaries because they violate none, and pose no data breach risks because they represent no actual persons.

This characteristic proves especially valuable when researching sensitive topics or hard-to-reach populations. Traditional research struggles to access certain demographics, whether due to geographic dispersion, social stigma, or simple unwillingness to participate. Synthetic audiences bypass these constraints entirely, though Wang and Zhou acknowledge that results should be used cautiously and tested before drawing definitive conclusions.

Synthetic personas help quantitative researchers study unreachable demographics and fill in missing data and surveys using advanced predictive capabilities, with niche audience segments analysed at no additional costs.

The approach does, however, introduce what might be termed “synthetic privacy concerns”. If organizations can generate convincing simulacra of consumer segments, complete with nuanced preferences and behaviours, do these digital constructs possess any claim to ethical consideration? The question remains largely unexamined in current discourse, though it may gain prominence as synthetic personas grow more sophisticated.

Research Methodology and Sample Size Limitations

The research acknowledges several constraints. Sample sizes for validation studies remained modest. The team collected 93 completed surveys for the choice experiment evaluation, whilst social simulation testing employed only 100 synthetic respondents per country. Larger samples would strengthen confidence in the observed correlations and potentially reveal relationships that current data cannot detect.

The discontinuation rate of 23% suggests some participants found the evaluation methodology overly complex, raising questions about whether human assessors can reliably judge synthetic persona quality. If genuine humans struggle to evaluate artificial constructs systematically, the circular dependency becomes apparent. We lack gold-standard benchmarks for measuring synthetic authenticity precisely because authentic humans prove difficult and expensive to study at scale.

Moreover, the taxonomy itself, whilst comprehensive, cannot claim genuine universality. Human attributes potentially extend infinitely, and the decision to terminate hierarchical expansion at three levels reflects pragmatic constraints rather than natural boundaries. The researchers found that most human attributes rarely extend beyond three hierarchical levels, as deeper chains degenerate into idiosyncratic leaf nodes which harm coverage balance and introduce sparsity.

Different cultural contexts might demand alternative organizational structures, and the reliance on English-language ChatGPT conversations introduces linguistic and cultural biases that may limit global applicability. Wang and Zhou note this explicitly as a constraint requiring future research attention.

When Synthetic Data Should Replace Human Surveys?

Philosophical considerations loom large. The research demonstrates that sufficiently deep synthetic personas narrow the gap between simulated and authentic human responses by 31.7% in social surveys and 17% in personality assessments. Yet this raises fundamental questions about what authenticity means in an era of increasingly convincing simulacra.

When synthetic personas answer value-laden questions about abortion justifiability, trust in others, or national pride, whose values do they express? The language models from which they emerge were trained on vast corpora of human-generated text, effectively compressing and recombining the expressed beliefs of millions. Synthetic personas thus represent not individual consciousness but statistical aggregations. They function as composite sketches drawn from humanity’s digital exhaust.

This characteristic renders them simultaneously valuable and problematic. They prove useful precisely because they approximate population-level patterns rather than idiosyncratic individual quirks. Yet this same quality means synthetic audiences cannot, by definition, reveal genuinely novel perspectives. They can only recombine and interpolate amongst viewpoints already present in training data, potentially reinforcing existing patterns rather than surfacing emergent phenomena.

Real human data remains irreplaceable for capturing emotional depth, cultural nuance, and authentic behaviour, with the path forward lying in thoughtful integration.

ROI and Implementation, Optimal Attribute Depth for Marketing Teams

The economic incentives driving synthetic audience adoption prove compelling. Budget constraints, privacy concerns, and demand for real-time information are pushing researchers to adopt AI-driven automations, whilst teams embracing innovation report growth in influence, budgets, and demand for services.

The speed differential alone justifies adoption for many use cases. Where traditional concept testing might require weeks to recruit participants and gather feedback, synthetic audiences provide instant responses at scale. For early-stage product development such as culling long lists of ideas, optimising messaging, or testing packaging variations, this acceleration offers genuine competitive advantage.

Wang and Zhou conducted ablation studies to determine optimal attribute depth. Performance across most metrics improved as the attribute count increased, consistently peaking within the 200-250 range. Further increasing the count to 300 resulted in noticeable performance decline, suggesting that excessive attributes introduce noise. This finding validates targeting 200-250 attributes to achieve optimal balance between descriptive richness and utility.

The research team also validated the model-agnostic nature of Deeppersona through cross-model evaluation, replicating the Germany society simulation task with DeepSeek-v3-0324, GPT-4o-mini, and Gemini-2.5-flash. Although response quality varied with each model’s inherent capabilities, Deeppersona consistently maintained robustness and effectiveness across architectures.

Human Validation Studies Confirm AI Performance Accuracy

To complement automated metrics, the researchers conducted rigorous human evaluation studies. The results strongly confirmed findings from LLM-as-judge evaluation, showing that Deeppersona consistently outperformed both PersonaHub and OpenCharacter. Human evaluators showed clear preference for responses generated by the method, evidenced by win rates ranging from 81.2% to 87.0% and superior ELO ratings across four key dimensions: Personalization-Fit, Attribute Coverage, Diversity of Suggestions, and Goal-Progress Alignment.

Researchers reported that 87% of those who have used synthetic responses report high satisfaction with results, with synthetic responses proving particularly valuable for testing packages, names, and messaging.

Future of Market Research - Predictions for Synthetic Audience Adoption

The trajectory appears clear. Synthetic audiences will become standard components of the market research toolkit rather than wholesale replacements for human subjects. Seventy-two per cent of researchers expect AI to predict market trends more accurately than human analysts within three years, suggesting confidence in the technology’s maturation.

Future developments will likely address current limitations. Continuous learning mechanisms could enable synthetic personas to evolve over time, capturing shifting preferences and emerging behaviours. Multi-modal capabilities might incorporate visual and audio elements, enhancing realism in simulated interactions. Cultural customisation could adapt taxonomies to different linguistic and social contexts, improving global applicability.

Toluna has announced expanded availability of its industry-leading synthetic personas, now covering 15 markets and 9 languages, with more than one million unique personas already created, demonstrating the commercial scalability of these approaches.

Yet fundamental questions persist about the appropriate boundaries of synthetic audience deployment. For which decisions do statistical approximations of human preferences suffice, and for which do we require irreducible individual voices? When does the convenience of synthetic research cross into epistemic irresponsibility or ethical negligence?

The Deeppersona research advances the technical frontier significantly, demonstrating that machine learning can now generate personas of genuine depth and utility. Wang, Zhou, and their international collaborators have stated they will release their codebase, taxonomy, and a profile dataset to catalyse research into agentic behaviour simulation, personalised and human-aligned AI.

Whether this capability proves blessing or burden depends largely on how practitioners navigate the tension between efficiency and authenticity, between the seductive convenience of digital proxies and the irreplaceable complexity of actual human beings. As synthetic audiences grow more sophisticated, the challenge will lie not in generating convincing simulacra but in remembering why we sought to understand real people in the first place.