Docs
verosynthea-validator
Open-source Python library for testing ML model bias against Australian demographic profiles. Free tier uses the public Hugging Face sample; paid tier (coming) hits the AUSynth API for full population draws.
Install
From PyPI:
pip install verosynthea-validatorRequires Python 3.9+. Pulls in pandas>=1.5 and numpy>=1.23. Optional extras: datasets for the Hugging Face sample loader, httpx for the (forthcoming) paid-API client.
Quickstart
Load the free Hugging Face sample, score your model against it, and print a fairness report:
from verosynthea_validator import load_ausynth_sample, FairnessReport
df = load_ausynth_sample() # 5,000 rows from the HF dataset
df["prediction"] = my_model.predict(df)
report = FairnessReport(
df,
y_true="label",
y_pred="prediction",
protected_columns=["SEXP", "BPLP", "profile_name"],
)
print(report.run().summary())The sample lives at huggingface.co/datasets/vero-synthea/ausynth-sample — 27 columns covering age, sex, income, occupation, education, family structure, and the 8 demographic profile assignments.
API reference
FairnessReport
Class that takes a scored DataFrame and a list of protected columns, computes group-wise metrics, and returns a structured report you can serialise or display.
FairnessReport(
data: pd.DataFrame,
y_true: str,
y_pred: str,
protected_columns: list[str],
)
.run() -> FairnessResultsFairnessResults.summary() renders a one-screen ASCII table. FairnessResults.to_dict()gives you a serialisable structure for logging / dashboards.
assert_fair
CI-gate helper. Raises FairnessAssertionError if any configured threshold is exceeded — drop into your test suite or model release pipeline.
assert_fair(
data: pd.DataFrame,
y_true: str,
y_pred: str,
*,
max_accuracy_gap: float = 0.05,
max_demographic_parity_gap: float = 0.10,
max_equalised_odds_gap: float = 0.10,
protected_columns: list[str] | None = None,
) -> Noneload_ausynth_sample
Convenience loader that downloads the free Hugging Face sample.
load_ausynth_sample(
suburb: str = "paddington_4064", # only the bundled sample for now
cache_dir: str | None = None,
) -> pd.DataFrameCI / CD gate
Block a model from shipping if fairness degrades beyond your thresholds. Drop this into pytest:
# tests/test_fairness.py
import pandas as pd
from verosynthea_validator import assert_fair
from my_app import score_batch
def test_model_is_fair_across_demographics():
df = pd.read_parquet("fixtures/holdout.parquet")
df["prediction"] = score_batch(df)
assert_fair(
df,
y_true="label",
y_pred="prediction",
max_accuracy_gap=0.05,
max_demographic_parity_gap=0.08,
protected_columns=["SEXP", "BPLP", "profile_name"],
)Or call it as a GitHub Actions step against the HF sample so PRs get a fairness verdict before merge:
# .github/workflows/fairness.yml
- name: Fairness gate
run: |
pip install verosynthea-validator
python -m my_app.evaluate_fairness # imports assert_fairMetrics computed
For each protected column, the validator reports three group-wise gaps:
| Metric | What it measures |
|---|---|
| Accuracy gap | Max accuracy difference between any two demographic groups. |
| Demographic parity gap | Max difference in selection rate P(y_pred = 1) across groups. |
| Equalised odds gap | Max difference in TPR or FPR across groups (per-group confusion matrix). |
Comparison to fairlearn / aif360
Both fairlearn and aif360 are broader fairness toolkits — many metrics, many bias-mitigation algorithms, lots of configuration. They're excellent if you're doing fairness research or building a custom pipeline.
verosynthea-validator is narrower on purpose: one Australia-calibrated reference population (AUSynth's 8 demographic profiles) and a one-liner CI gate. If you're shipping a model that touches Australian customers and you want a cheap pre-deploy check that you didn't accidentally trade accuracy for fairness in one demographic group, use this. If you want to compose 30 mitigation strategies, use fairlearn.
Next steps
- Read the README on GitHub for the full repo, tests, and issue tracker.
- Skim the AUSynth dataset card on Hugging Face to see the columns the validator scores against.
- For the full Australian population (not just the 5k sample), buy a data bundle — same calibration, same columns, just bigger.
- For context on what AUSynth is, see /for-ai-labs or the methodology overview.