<aside>

WORKING DRAFT FOR PUBLIC FEEDBACK For more context on this draft, please see here. Please submit feedback here.

</aside>


Previous: Synthetic Participants & Facilitators (AI Agents)

Table of Contents

Next: Automated Synthesis


Overview

Synthetic data refers to artificially generated information that mimics the behavior and statistical properties of real-world data through algorithms, generative models, or simulations. It can be in the form of analyzing existing datasets to impute missing data (e.g., generating data points for groups of people who are underrepresented in the original sample) or through the use of synthetic participants which are programmed to mimic survey responses through specific user personas. The usage of synthetic data has been extended across several domains (e.g., health, privacy, read teaming) and formats such as vision, audio, and text, including to create datasets with adversarial examples to detect and understand vulnerabilities in models, to cover sensitive topics, and improve model’s ability to effectively handle real-world inputs instead of working with real users or experts to collect data from real-world circumstances.

Strengths and Useful Applications

Potential Risks of Use