title: "Data Format"

Data Format

The Profile Scoring API accepts a CSV file with one row per person and returns the same file with profile assignments added.

Required columns

Your CSV must include an id column and the nine profiling variables:

Column	Description
`id`	Your identifier for each row. Can be any string or number. Preserved verbatim in the output.
`AGE_GROUP`	Age / life stage
`SEX`	Sex
`MARITAL_STATUS`	Marital status
`HOUSEHOLD_ROLE`	Role in household
`INCOME`	Personal weekly income bracket
`EDUCATION`	Highest educational attainment
`OCCUPATION`	Occupation group
`INDUSTRY`	Industry of employment
`LABOUR_FORCE`	Labour force status

Column name matching is case-insensitive, so age_group, AGE_GROUP, and Age_Group all work.

Accepted values

Each variable accepts collapsed category labels. For example, INCOME accepts No income, Low, Mid, High, or Very high. See Variable Reference for the complete list of valid values for every variable.

You can also submit raw ABS Census labels (for example, 25-29 years instead of Early career). The API maps these to collapsed categories automatically. Both formats can be mixed within the same file.

Any value that does not match a known label is treated as Not stated and the row is still scored. The response flags these in the data_quality object so you can fix your data.

Alternative column names

If your data uses raw ABS variable names, the API accepts those too:

Collapsed name	ABS name
AGE_GROUP	AGE5P
SEX	SEXP
MARITAL_STATUS	MSTP
HOUSEHOLD_ROLE	RLHP
INCOME	INCP
EDUCATION	HEAP
OCCUPATION	OCCP
INDUSTRY	INDP
LABOUR_FORCE	LFSP

You can use either set of column names, but not a mix of both in the same file.

Handling missing data

If a value is missing, empty, or set to Not stated, the row is still scored. The API does not drop any rows.

Records with three or more Not stated fields receive a profile_confidence value of "low" instead of "high". This is a signal to treat the assignment with caution, not a reason to discard the row.

The id column

The id column is required. It passes through scoring untouched and appears in the output CSV in the same position. Use it to join scored results back to your original dataset.

The id values do not need to be unique, but unique IDs make joining easier.

File limits

Limit	Value
Maximum file size	100 MB
Maximum rows per request	1,000,000

Files over 100 MB or 1 million rows are rejected with a 413 error. If your dataset is larger, split it into multiple files.

Output format

The returned CSV contains all your original columns plus three new ones at the end:

Column	Type	Description
`profile_id`	Integer (0-7)	Numeric profile identifier
`profile_name`	String	Human-readable profile name (e.g. "High-earning professionals")
`profile_confidence`	String	`"high"` or `"low"`

The output CSV is available via a presigned download URL that expires after 24 hours.

Example

Input:

id,AGE_GROUP,SEX,MARITAL_STATUS,HOUSEHOLD_ROLE,INCOME,EDUCATION,OCCUPATION,INDUSTRY,LABOUR_FORCE
CUST_1000,Senior,Male,Single,Non-family,Mid,Year 11 or below,Not in employment,Knowledge services,Unemployed

Output:

id,AGE_GROUP,SEX,...,LABOUR_FORCE,profile_id,profile_name,profile_confidence
CUST_1000,Senior,Male,...,Unemployed,1,Young singles and non-workers,high