Docs

Profile clustering methodology

Population Profile Analysis -- Methodology

What Profiles Represent

AUSynth segments the Australian population into 8 demographic profiles based on a combination of age, sex, marital status, household relationship, income, education, occupation, industry, and labour force engagement.

These profiles are derived from statistical patterns in the data. They are not predefined categories but emerge from clustering analysis applied to real ABS Census data (adjusted to 2025-26 values).

How Profiles Were Created

Profiles are derived from Australian Census records using multiple correspondence analysis (MCA) combined with K-means clustering. MCA reduces 9 categorical variables into a continuous coordinate space while preserving the relationships between categories. K-means then groups people with similar MCA coordinates into distinct profiles.

Each of the 9 input variables is grouped into broad, commercially meaningful categories (for example, age is grouped into five life stages rather than 21 five-year brackets). This simplification improves the clarity of the resulting profiles without losing meaningful demographic distinctions.

Records with substantive responses across the profiling variables were used to fit the model; the fitted model then assigns a profile to every record in the synthetic population.

A small share of records have extensive non-response in the original Census data. These are still assigned a profile based on the fitted model, but with lower confidence -- the model maps them to the nearest demographic cluster based on the information that IS available.

No records are excluded from profile assignment. Profile composition in customer reports represents the full population of the selected geography.

How Many Profiles?

We evaluated solutions from 4 to 10 profiles using silhouette scores (a measure of cluster separation) and interpretability. Eight profiles were selected because they best balance statistical fit with practical usefulness: fewer profiles miss important distinctions (such as conflating labourers with trades workers); more profiles become too granular to be actionable.

The 8 Profiles

  1. Labourers and operators (7.4%) -- Predominantly male (74%) workers in labouring and machinery operation roles (60%). Spread across working ages, concentrated in transport, logistics, and trade industries. Most work full-time (59%) with mid-range incomes.

  2. Young singles and non-workers (22.0%) -- Single individuals (96%) spanning young adults through to seniors, largely outside paid employment (92%). Includes students, young adults living at home, unemployed individuals, and older singles not in the workforce.

  3. Children (1.3%) -- Children aged 0-14 (100%), living as dependent children in family households. Evenly split by sex.

  4. Non-earning dependants (3.4%) -- Adults with no personal income (99%) -- often partnered adults at home, students, or those in transition. Skews older and female (61%). Most are not in the labour force (70%). Includes non-working spouses, unpaid carers, and some early retirees.

  5. Trades and technical workers (10.9%) -- Overwhelmingly male (91%) with vocational qualifications (46% Certificate III-IV). A mix of active trades workers and those not currently employed. Concentrated in trade and construction.

  6. Established partnered households (16.7%) -- Partnered family heads (56%), predominantly female (87%) and mid-income (49% mid bracket). Most work part-time (66%) in clerical, sales, administration, or community services roles. Spans mid-career to senior ages.

  7. Retired and semi-retired (24.4%) -- The largest profile. Predominantly partnered (82%) family heads (74%), skewing senior (50%). Most are not in paid employment (89%). Represents retirees, semi-retired individuals, and older partnered homemakers.

  8. High-earning professionals (14.0%) -- Tertiary-qualified professionals and managers, predominantly high or very high income. University-educated (59% bachelor degree or higher), mostly full-time workers (75%). The highest-earning profile.

How To Use The Profile Composition Analysis

The grouped bar chart shows the percentage of your area's population in each profile (blue bars) alongside the national average (grey bars). Profiles are sorted by the size of the difference.

What to look for:

  • Over-represented profiles (blue bar taller than grey): these define what makes your area distinctive. An area with 25% high-earning professionals versus 14% nationally has a markedly different service and housing profile.

  • Under-represented profiles (blue bar shorter than grey): these indicate gaps relative to the national mix. An area with few retirees may have less demand for aged care services.

  • Concentration vs diversity: areas where one or two profiles dominate have more predictable characteristics; areas with an even spread require broader planning approaches.

Variables Used

The 9 variables that define profiles are: age (5 life-stage groups), sex, marital status (partnered/single), relationship in household (4 roles), personal income (5 brackets), highest education (5 levels), occupation (5 groups), industry (5 sectors), and labour force status (4 categories). Geographic variables are deliberately excluded so that profiles describe who people are, not where they live.

Methodological Notes

Profile assignments are deterministic: each person is assigned to their nearest cluster centroid in MCA space. The distance to that centroid is also recorded, providing a measure of how typical each person is of their assigned profile. People near cluster boundaries could plausibly belong to adjacent profiles.

Records with high rates of non-response in the original Census data are still assigned a profile, but flagged with lower confidence. The model assigns them to the nearest demographic cluster based on the information that is available. This means profile composition in your reports always represents the full population -- no records are excluded.

This analysis uses real ABS Census data (adjusted to 2025-26) processed through statistical clustering. The profiles describe actual Australian population structure, not artefacts of the synthesis methodology.