Docs

Data format


title: "Data Format"

Data Format

The Profile Scoring API accepts a CSV file with one row per person and returns the same file with profile assignments added.

Required columns

Your CSV must include an id column and the nine profiling variables:

ColumnDescription
idYour identifier for each row. Can be any string or number. Preserved verbatim in the output.
AGE_GROUPAge / life stage
SEXSex
MARITAL_STATUSMarital status
HOUSEHOLD_ROLERole in household
INCOMEPersonal weekly income bracket
EDUCATIONHighest educational attainment
OCCUPATIONOccupation group
INDUSTRYIndustry of employment
LABOUR_FORCELabour force status

Column name matching is case-insensitive, so age_group, AGE_GROUP, and Age_Group all work.

Accepted values

Each variable accepts collapsed category labels. For example, INCOME accepts No income, Low, Mid, High, or Very high. See Variable Reference for the complete list of valid values for every variable.

You can also submit raw ABS Census labels (for example, 25-29 years instead of Early career). The API maps these to collapsed categories automatically. Both formats can be mixed within the same file.

Any value that does not match a known label is treated as Not stated and the row is still scored. The response flags these in the data_quality object so you can fix your data.

Alternative column names

If your data uses raw ABS variable names, the API accepts those too:

Collapsed nameABS name
AGE_GROUPAGE5P
SEXSEXP
MARITAL_STATUSMSTP
HOUSEHOLD_ROLERLHP
INCOMEINCP
EDUCATIONHEAP
OCCUPATIONOCCP
INDUSTRYINDP
LABOUR_FORCELFSP

You can use either set of column names, but not a mix of both in the same file.

Handling missing data

If a value is missing, empty, or set to Not stated, the row is still scored. The API does not drop any rows.

Records with three or more Not stated fields receive a profile_confidence value of "low" instead of "high". This is a signal to treat the assignment with caution, not a reason to discard the row.

The id column

The id column is required. It passes through scoring untouched and appears in the output CSV in the same position. Use it to join scored results back to your original dataset.

The id values do not need to be unique, but unique IDs make joining easier.

File limits

LimitValue
Maximum file size100 MB
Maximum rows per request1,000,000

Files over 100 MB or 1 million rows are rejected with a 413 error. If your dataset is larger, split it into multiple files.

Output format

The returned CSV contains all your original columns plus three new ones at the end:

ColumnTypeDescription
profile_idInteger (0-7)Numeric profile identifier
profile_nameStringHuman-readable profile name (e.g. "High-earning professionals")
profile_confidenceString"high" or "low"

The output CSV is available via a presigned download URL that expires after 24 hours.

Example

Input:

id,AGE_GROUP,SEX,MARITAL_STATUS,HOUSEHOLD_ROLE,INCOME,EDUCATION,OCCUPATION,INDUSTRY,LABOUR_FORCE
CUST_1000,Senior,Male,Single,Non-family,Mid,Year 11 or below,Not in employment,Knowledge services,Unemployed

Output:

id,AGE_GROUP,SEX,...,LABOUR_FORCE,profile_id,profile_name,profile_confidence
CUST_1000,Senior,Male,...,Unemployed,1,Young singles and non-workers,high