For AI/ML teams

Train on Australia. Test for bias. Ship with confidence.

Privacy-preserving synthetic Australian demographic data, calibrated to ABS Census 2021. For ML training, customer profiling, demo data, and AI fine-tuning.

Sample on Hugging FaceOpen-source validator on PyPI

What you can do with AUSynth

Train on Australia

Realistic Australian population data, no privacy paperwork. 27.5M synthetic individuals, 47 demographic variables, validated against ABS Census 2021. Pandas / Polars / PyTorch / TensorFlow ready.

Sample on Hugging Face โ†’

Know your customer base

Send your customer data, get it scored against 8 demographic profiles. Understand who your customers are without exposing PII to a third party.

Profile scoring API โ†’

Realistic data, no real people

Privacy-safe Australian data for staging environments, QA, customer support training, and sales demos. Looks real, contains zero PII.

Get started โ†’

Australian-aware AI

Adapt your existing models to Australian demographic reality. Generate training corpora for LoRA fine-tuning, RLHF datasets, and instruction tuning.

Fine-tuning use case โ†’

How AUSynth compares

ย AUSynthReal CensusIPF SyntheticFaker
Privacy-safeโœ“Restrictedโœ“โœ“
ABS Census 2021 calibratedโœ“โœ“Partialโ€”
Individual-level recordsโœ“โ€”โœ“โœ“
Statistically validatedSRMSE 0.05N/AVariesโ€”
Suburb-level granularity15,34315,343Coarserโ€”
Commercial licenseโœ“RestrictedVariesโœ“
Free sample5k rowsโ€”โ€”Synthetic-only
ML-ready formatParquet/CSVAggregatesVariesJSON/CSV
Cross-tabs preservedโœ“โœ“Partialโ€”

Common use cases

Training ML models without privacy risk

Australian fintechs and insurers building models on customer demographics often can't share real data across teams or with external vendors. AUSynth gives a statistically representative training set with no real individuals attached, so the model can move between environments without privacy review delays.

Bias testing before deployment

Score your model against AUSynth's 8 demographic profiles to find where its predictions diverge from population reality. Catches biased income predictions, occupation mismatches, family-structure assumptions before they affect customers.

from verosynthea_validator import assert_fair

assert_fair(
    model,
    AUSynth.sample("national"),
    metric="demographic_parity",
    threshold=0.05,
)

Built for AI/ML workflows

  • โœ“Hugging Face dataset
  • โœ“Pandas / Polars / PyTorch / TensorFlow ready
  • โœ“Parquet / CSV / Arrow
  • โœ“Validated to SRMSE 0.05
  • โœ“8 demographic profiles
  • โœ“Family + dwelling structure preserved
  • โœ“15,343 suburb granularity
  • โœ“Geographically coherent
  • โœ“Annual updates
  • โœ“Commercial license

Methodology, briefly

AUSynth is generated from ABS Census 2021 conditional tables using Bayesian reconstruction with Gibbs sampling. Each synthetic individual is drawn from a joint distribution that preserves the real Census cross-tabulations between age, sex, income, occupation, education, industry, household composition and dwelling type. Population marginals match real Census at 99.9%+ accuracy; cross-tab fidelity sits at median person-level SRMSE 0.05 across 15,343 suburbs. Adjusted to 2026 for wage and price growth.

Read the full methodology โ†’

Pricing for AI/ML teams

Free tier

  • โœ“ 5 credits per week, never expires
  • โœ“ Free 5k-row sample on Hugging Face
  • โœ“ verosynthea-validator free tier

Credit bundles

1 credit = 500 synthetic records. Bundles never expire. The Power bundle covers 2.5M synthetic individuals at the standard rate.

Starter

$10

10 credits

Small

$40

50 credits

Regular

$120

200 credits

Pro

$450

1,000 credits

Power

$1,500

5,000 credits

FAQ

  • Is synthetic data really good enough for ML training?

    For population-level features (income, age, occupation, education, family structure) calibrated to a real Census, yes. AUSynth preserves the joint distributions between variables, not just the marginals, so models trained on it learn the same correlations they would learn from the underlying real data. The catch: synthetic data can't substitute for your real customer signal โ€” use AUSynth for the demographic substrate, your own data for the business outcomes you're modelling.

  • How does this compare to differential privacy?

    Differential privacy adds calibrated noise to query results from a real dataset. AUSynth doesn't query a real dataset at all โ€” every record is generated from public Census conditionals, so there's no per-individual privacy budget to spend. The trade-off: differential-privacy methods preserve real-data structure perfectly within the noise bound; AUSynth preserves it as well as the Census conditionals let it (SRMSE 0.05 across 15,343 suburbs).

  • Can I use this for commercial models?

    Yes. The credit bundles include commercial use rights. Cite as: Verosynthea AUSynth v1.0 (2026). verosynthea.com.

  • Will the data be updated for Census 2026?

    Yes. ABS Census 2026 fieldwork is in 2026, with conditional tables typically published mid-to-late the following year. AUSynth v2 will rebuild against those tables when they're available; v1 will remain accessible for reproducibility.

  • Do you support countries beyond Australia?

    Not yet. AUSynth is the Australian product. The same Bayesian-reconstruction methodology applies to any country with public conditional Census tables; we'll expand once the Australian product has settled.

  • Can I use AUSynth to fine-tune LLMs?

    Yes. We provide a row-to-text workflow that converts AUSynth records into natural-language descriptions suitable for LoRA fine-tuning, RLHF datasets, and instruction tuning. See /use-cases/llm-fine-tuning for the full pattern + example.

  • Can I score my own customer data against AUSynth profiles?

    Yes. The profile-scoring API takes your demographic records (CSV) and returns a score against each of the 8 Verosynthea demographic profiles. Useful for understanding who your customers actually are without sending raw PII to a third party. See /products/profiling.

Ready to try it?

Start with the free Hugging Face sample. No signup needed.

For AI/ML Teams - Verosynthea