Docs

verosynthea-validator

Open-source Python library for testing ML model bias against Australian demographic profiles. Free tier uses the public Hugging Face sample; paid tier (coming) hits the AUSynth API for full population draws.

GitHubHugging Face datasetFor AI/ML teams

Install

From PyPI:

pip install verosynthea-validator

Requires Python 3.9+. Pulls in pandas>=1.5 and numpy>=1.23. Optional extras: datasets for the Hugging Face sample loader, httpx for the (forthcoming) paid-API client.

Quickstart

Load the free Hugging Face sample, score your model against it, and print a fairness report:

from verosynthea_validator import load_ausynth_sample, FairnessReport

df = load_ausynth_sample()                     # 5,000 rows from the HF dataset
df["prediction"] = my_model.predict(df)

report = FairnessReport(
    df,
    y_true="label",
    y_pred="prediction",
    protected_columns=["SEXP", "BPLP", "profile_name"],
)
print(report.run().summary())

The sample lives at huggingface.co/datasets/vero-synthea/ausynth-sample — 27 columns covering age, sex, income, occupation, education, family structure, and the 8 demographic profile assignments.

API reference

FairnessReport

Class that takes a scored DataFrame and a list of protected columns, computes group-wise metrics, and returns a structured report you can serialise or display.

FairnessReport(
    data: pd.DataFrame,
    y_true: str,
    y_pred: str,
    protected_columns: list[str],
)
.run() -> FairnessResults

FairnessResults.summary() renders a one-screen ASCII table. FairnessResults.to_dict()gives you a serialisable structure for logging / dashboards.

assert_fair

CI-gate helper. Raises FairnessAssertionError if any configured threshold is exceeded — drop into your test suite or model release pipeline.

assert_fair(
    data: pd.DataFrame,
    y_true: str,
    y_pred: str,
    *,
    max_accuracy_gap: float            = 0.05,
    max_demographic_parity_gap: float  = 0.10,
    max_equalised_odds_gap: float      = 0.10,
    protected_columns: list[str] | None = None,
) -> None

load_ausynth_sample

Convenience loader that downloads the free Hugging Face sample.

load_ausynth_sample(
    suburb: str = "paddington_4064",   # only the bundled sample for now
    cache_dir: str | None = None,
) -> pd.DataFrame

CI / CD gate

Block a model from shipping if fairness degrades beyond your thresholds. Drop this into pytest:

# tests/test_fairness.py
import pandas as pd
from verosynthea_validator import assert_fair
from my_app import score_batch

def test_model_is_fair_across_demographics():
    df = pd.read_parquet("fixtures/holdout.parquet")
    df["prediction"] = score_batch(df)

    assert_fair(
        df,
        y_true="label",
        y_pred="prediction",
        max_accuracy_gap=0.05,
        max_demographic_parity_gap=0.08,
        protected_columns=["SEXP", "BPLP", "profile_name"],
    )

Or call it as a GitHub Actions step against the HF sample so PRs get a fairness verdict before merge:

# .github/workflows/fairness.yml
- name: Fairness gate
  run: |
    pip install verosynthea-validator
    python -m my_app.evaluate_fairness  # imports assert_fair

Metrics computed

For each protected column, the validator reports three group-wise gaps:

MetricWhat it measures
Accuracy gapMax accuracy difference between any two demographic groups.
Demographic parity gapMax difference in selection rate P(y_pred = 1) across groups.
Equalised odds gapMax difference in TPR or FPR across groups (per-group confusion matrix).

Comparison to fairlearn / aif360

Both fairlearn and aif360 are broader fairness toolkits — many metrics, many bias-mitigation algorithms, lots of configuration. They're excellent if you're doing fairness research or building a custom pipeline.

verosynthea-validator is narrower on purpose: one Australia-calibrated reference population (AUSynth's 8 demographic profiles) and a one-liner CI gate. If you're shipping a model that touches Australian customers and you want a cheap pre-deploy check that you didn't accidentally trade accuracy for fairness in one demographic group, use this. If you want to compose 30 mitigation strategies, use fairlearn.

Next steps