Train on Australia
Realistic Australian population data, no privacy paperwork. 27.5M synthetic individuals, 47 demographic variables, validated against ABS Census 2021. Pandas / Polars / PyTorch / TensorFlow ready.
Sample on Hugging Face โFor AI/ML teams
Privacy-preserving synthetic Australian demographic data, calibrated to ABS Census 2021. For ML training, customer profiling, demo data, and AI fine-tuning.
Realistic Australian population data, no privacy paperwork. 27.5M synthetic individuals, 47 demographic variables, validated against ABS Census 2021. Pandas / Polars / PyTorch / TensorFlow ready.
Sample on Hugging Face โSend your customer data, get it scored against 8 demographic profiles. Understand who your customers are without exposing PII to a third party.
Profile scoring API โPrivacy-safe Australian data for staging environments, QA, customer support training, and sales demos. Looks real, contains zero PII.
Get started โAdapt your existing models to Australian demographic reality. Generate training corpora for LoRA fine-tuning, RLHF datasets, and instruction tuning.
Fine-tuning use case โ| ย | AUSynth | Real Census | IPF Synthetic | Faker |
|---|---|---|---|---|
| Privacy-safe | โ | Restricted | โ | โ |
| ABS Census 2021 calibrated | โ | โ | Partial | โ |
| Individual-level records | โ | โ | โ | โ |
| Statistically validated | SRMSE 0.05 | N/A | Varies | โ |
| Suburb-level granularity | 15,343 | 15,343 | Coarser | โ |
| Commercial license | โ | Restricted | Varies | โ |
| Free sample | 5k rows | โ | โ | Synthetic-only |
| ML-ready format | Parquet/CSV | Aggregates | Varies | JSON/CSV |
| Cross-tabs preserved | โ | โ | Partial | โ |
Australian fintechs and insurers building models on customer demographics often can't share real data across teams or with external vendors. AUSynth gives a statistically representative training set with no real individuals attached, so the model can move between environments without privacy review delays.
Score your model against AUSynth's 8 demographic profiles to find where its predictions diverge from population reality. Catches biased income predictions, occupation mismatches, family-structure assumptions before they affect customers.
from verosynthea_validator import assert_fair
assert_fair(
model,
AUSynth.sample("national"),
metric="demographic_parity",
threshold=0.05,
)AUSynth is generated from ABS Census 2021 conditional tables using Bayesian reconstruction with Gibbs sampling. Each synthetic individual is drawn from a joint distribution that preserves the real Census cross-tabulations between age, sex, income, occupation, education, industry, household composition and dwelling type. Population marginals match real Census at 99.9%+ accuracy; cross-tab fidelity sits at median person-level SRMSE 0.05 across 15,343 suburbs. Adjusted to 2026 for wage and price growth.
1 credit = 500 synthetic records. Bundles never expire. The Power bundle covers 2.5M synthetic individuals at the standard rate.
Starter
$10
10 credits
Small
$40
50 credits
Regular
$120
200 credits
Pro
$450
1,000 credits
Power
$1,500
5,000 credits
For population-level features (income, age, occupation, education, family structure) calibrated to a real Census, yes. AUSynth preserves the joint distributions between variables, not just the marginals, so models trained on it learn the same correlations they would learn from the underlying real data. The catch: synthetic data can't substitute for your real customer signal โ use AUSynth for the demographic substrate, your own data for the business outcomes you're modelling.
Differential privacy adds calibrated noise to query results from a real dataset. AUSynth doesn't query a real dataset at all โ every record is generated from public Census conditionals, so there's no per-individual privacy budget to spend. The trade-off: differential-privacy methods preserve real-data structure perfectly within the noise bound; AUSynth preserves it as well as the Census conditionals let it (SRMSE 0.05 across 15,343 suburbs).
Yes. The credit bundles include commercial use rights. Cite as: Verosynthea AUSynth v1.0 (2026). verosynthea.com.
Yes. ABS Census 2026 fieldwork is in 2026, with conditional tables typically published mid-to-late the following year. AUSynth v2 will rebuild against those tables when they're available; v1 will remain accessible for reproducibility.
Not yet. AUSynth is the Australian product. The same Bayesian-reconstruction methodology applies to any country with public conditional Census tables; we'll expand once the Australian product has settled.
Yes. We provide a row-to-text workflow that converts AUSynth records into natural-language descriptions suitable for LoRA fine-tuning, RLHF datasets, and instruction tuning. See /use-cases/llm-fine-tuning for the full pattern + example.
Yes. The profile-scoring API takes your demographic records (CSV) and returns a score against each of the 8 Verosynthea demographic profiles. Useful for understanding who your customers actually are without sending raw PII to a third party. See /products/profiling.
Start with the free Hugging Face sample. No signup needed.