Docs
Data format
title: "Data Format"
Data Format
The Profile Scoring API accepts a CSV file with one row per person and returns the same file with profile assignments added.
Required columns
Your CSV must include an id column and the nine profiling variables:
| Column | Description |
|---|---|
id | Your identifier for each row. Can be any string or number. Preserved verbatim in the output. |
AGE_GROUP | Age / life stage |
SEX | Sex |
MARITAL_STATUS | Marital status |
HOUSEHOLD_ROLE | Role in household |
INCOME | Personal weekly income bracket |
EDUCATION | Highest educational attainment |
OCCUPATION | Occupation group |
INDUSTRY | Industry of employment |
LABOUR_FORCE | Labour force status |
Column name matching is case-insensitive, so age_group, AGE_GROUP, and Age_Group all work.
Accepted values
Each variable accepts collapsed category labels. For example, INCOME accepts No income, Low, Mid, High, or Very high. See Variable Reference for the complete list of valid values for every variable.
You can also submit raw ABS Census labels (for example, 25-29 years instead of Early career). The API maps these to collapsed categories automatically. Both formats can be mixed within the same file.
Any value that does not match a known label is treated as Not stated and the row is still scored. The response flags these in the data_quality object so you can fix your data.
Alternative column names
If your data uses raw ABS variable names, the API accepts those too:
| Collapsed name | ABS name |
|---|---|
| AGE_GROUP | AGE5P |
| SEX | SEXP |
| MARITAL_STATUS | MSTP |
| HOUSEHOLD_ROLE | RLHP |
| INCOME | INCP |
| EDUCATION | HEAP |
| OCCUPATION | OCCP |
| INDUSTRY | INDP |
| LABOUR_FORCE | LFSP |
You can use either set of column names, but not a mix of both in the same file.
Handling missing data
If a value is missing, empty, or set to Not stated, the row is still scored. The API does not drop any rows.
Records with three or more Not stated fields receive a profile_confidence value of "low" instead of "high". This is a signal to treat the assignment with caution, not a reason to discard the row.
The id column
The id column is required. It passes through scoring untouched and appears in the output CSV in the same position. Use it to join scored results back to your original dataset.
The id values do not need to be unique, but unique IDs make joining easier.
File limits
| Limit | Value |
|---|---|
| Maximum file size | 100 MB |
| Maximum rows per request | 1,000,000 |
Files over 100 MB or 1 million rows are rejected with a 413 error. If your dataset is larger, split it into multiple files.
Output format
The returned CSV contains all your original columns plus three new ones at the end:
| Column | Type | Description |
|---|---|---|
profile_id | Integer (0-7) | Numeric profile identifier |
profile_name | String | Human-readable profile name (e.g. "High-earning professionals") |
profile_confidence | String | "high" or "low" |
The output CSV is available via a presigned download URL that expires after 24 hours.
Example
Input:
id,AGE_GROUP,SEX,MARITAL_STATUS,HOUSEHOLD_ROLE,INCOME,EDUCATION,OCCUPATION,INDUSTRY,LABOUR_FORCE
CUST_1000,Senior,Male,Single,Non-family,Mid,Year 11 or below,Not in employment,Knowledge services,Unemployed
Output:
id,AGE_GROUP,SEX,...,LABOUR_FORCE,profile_id,profile_name,profile_confidence
CUST_1000,Senior,Male,...,Unemployed,1,Young singles and non-workers,high