AutoMBS — AI-Powered Medicare Benefits Schedule Coding Assistant

AutoMBS — AI-Powered Medicare Benefits Schedule Coding Assistant

Challenge Overview

Presented by NexusMD

The Medicare Benefits Schedule (MBS) is a comprehensive list of medical services subsidised by the Australian Government. Each service has:
An unique item number
A description
Clinical use requirements
Associated rules (e.g., frequency limits, combinations allowed/disallowed, special conditions for eligibility)

Correctly selecting the right MBS item code(s) from clinical documentation is critical for:
Accurate billing
Compliance with government regulations
Maximising reimbursement without violating policy
Manual MBS coding is time-consuming and error-prone, especially when multiple services are provided in one episode of care.
Your challenge: Build an AI-assisted coding tool that automatically suggests appropriate MBS item numbers from clinical notes, explains its reasoning, and estimates confidence — focusing on accuracy and coverage.

Challenge Overview

Presented by NexuMD

The Medicare Benefits Schedule (MBS) is a comprehensive list of medical services subsidised by the Australian Government. Each service has:
• An unique item number
• A description
• Clinical use requirements
• Associated rules (e.g., frequency limits, combinations allowed/disallowed, special conditions for eligibility)

Correctly selecting the right MBS item code(s) from clinical documentation is critical for:
• Accurate billing
• Compliance with government regulations
• Maximising reimbursement without violating policy
Manual MBS coding is time-consuming and error-prone, especially when multiple services are provided in one episode of care.
Your challenge: Build an AI-assisted coding tool that automatically suggests appropriate MBS item numbers from clinical notes, explains its reasoning, and estimates confidence — focusing on accuracy and coverage.

Mission Goals

An end-to-end system that:
1.      Ingests clinical documentation from simulated patient encounters (e.g., GP consultation, ED visit, specialist review, diagnostic test, procedure).
2.      Applies an MBS knowledge base (built from MBS Online data).
3.      Generates candidate MBS item numbers with:
• Reasoning (why it applies)
• Evidence from the note
• Confidence score
4.      Supports multiple service scenarios (e.g., same-day GP consult + pathology, imaging + report).
5.      Measures accuracy (correctness of suggested codes) and coverage (proportion of eligible codes suggested).

An end-to-end system that:
1.      Ingests clinical documentation from simulated patient encounters (e.g., GP consultation, ED visit, specialist review, diagnostic test, procedure).
2.      Applies an MBS knowledge base (built from MBS Online data).
3.      Generates candidate MBS item numbers with:
• Reasoning (why it applies)
• Evidence from the note
• Confidence score
4.      Supports multiple service scenarios (e.g., same-day GP consult + pathology, imaging + report).
5.      Measures accuracy (correctness of suggested codes) and coverage (proportion of eligible codes suggested).

Challenge Scope

Focus on 8–12 common and diverse MBS categories, for example:
GP attendances (Level A–D)
• Specialist consultations
• Pathology services
• Diagnostic imaging
• Minor procedures (e.g., suturing)
• Telehealth consultations
• ECG, spirometry
• Allied health items (optional stretch)
Partial coverage is fine — document which categories you implement.

Focus on 8–12 common and diverse MBS categories, for example:
GP attendances (Level A–D)
• Specialist consultations
• Pathology services
• Diagnostic imaging
• Minor procedures (e.g., suturing)
• Telehealth consultations
• ECG, spirometry
• Allied health items (optional stretch)
Partial coverage is fine — document which categories you implement.

Inputs & Data Preparation

Required:
Synthetic episodes: Create realistic notes containing:
•        History, examination, diagnosis
•        Procedures, investigations
•        Disposition or follow-up
Include multiple service combinations in single encounters.
Ensure variation in wording, so models can’t just keyword-match.

Optional Multimodal:
•        Include mock attachments (e.g., imaging report text, lab results).
•        Provide structured test data in JSON alongside free-text notes.

Required:
Synthetic episodes: Create realistic notes containing:
•        History, examination, diagnosis
•        Procedures, investigations
•        Disposition or follow-up
Include multiple service combinations in single encounters.
Ensure variation in wording, so models can’t just keyword-match.

Optional Multimodal:
•        Include mock attachments (e.g., imaging report text, lab results).
•        Provide structured test data in JSON alongside free-text notes.

Knowledge & Rules

Use MBS Online as the official reference.
Extract:
• Item number, description
• Eligibility requirements
• Restrictions (frequency, same-day rules)

Create a searchable, structured knowledge base (YAML/JSON or a small DB).

Encode rule snippets (e.g., “Item 23 cannot be claimed with Item 36 on same day”).

Support basic combination logic for at least 3–5 common restriction scenarios.

Use MBS Online as the official reference.
Extract:
• Item number, description
• Eligibility requirements
• Restrictions (frequency, same-day rules)

Create a searchable, structured knowledge base (YAML/JSON or a small DB).

Encode rule snippets (e.g., “Item 23 cannot be claimed with Item 36 on same day”).

Support basic combination logic for at least 3–5 common restriction scenarios.

System Requirements

Core Functional Requirements:
1. Code Suggestion API (POST /mbs-codes)

 

Returns:
• Item numbers
• Description
• Reasoning with evidence from note
• Confidence score
2. Explainability

 

For each suggestion: cite which part of the note matched the eligibility criteria.
3. Coverage Tracking
• Count eligible services missed by the model.
4. Accuracy Tracking
• Compare model output to a gold standard for sample cases.
5. Test Automation
• Automated validation against gold-standard cases.

Prompt Engineering & Model Strategy (Required)
1. Prompt Templates for:
• Entity extraction
• Mapping to MBS rules
• Generating explanation text
2. Hot-reload: update prompts during testing without redeploying.
• Backup models: switch to alternate model or fallback rule-matching when primary fails.
3. Metrics logging: track prompt versions, rule matches, and performance.

Core Functional Requirements:
1. Code Suggestion API (POST /mbs-codes)

 

Returns:
• Item numbers
• Description
• Reasoning with evidence from note
• Confidence score
2. Explainability

 

For each suggestion: cite which part of the note matched the eligibility criteria.
3. Coverage Tracking
• Count eligible services missed by the model.
4. Accuracy Tracking
• Compare model output to a gold standard for sample cases.
5. Test Automation
• Automated validation against gold-standard cases.

Prompt Engineering & Model Strategy (Required)
1. Prompt Templates for:
• Entity extraction
• Mapping to MBS rules
• Generating explanation text
2. Hot-reload: update prompts during testing without redeploying.
• Backup models: switch to alternate model or fallback rule-matching when primary fails.
3. Metrics logging: track prompt versions, rule matches, and performance.

Suggested Architecture (Resource-Friendly)

1. Ingestion: FastAPI (Python) or Node/Express, JSON/FHIR parsing preferred.
2. Knowledge/Rules: YAML/JSON rules + a small evaluator; or use a rules engine (e.g., simple Python rule runner).
3. NLP/LLM:
• Baseline: keyword/regex + mapping tables.
• Practical: small open models (e.g., 7–13B local) for entity extraction & normalization; or hosted APIs if allowed.
• RAG: embed short rule snippets; never store proprietary content.
4. Storage: lightweight (SQLite/Postgres) for prompts, rules, and logs.
5. UI: minimal web UI to upload/select an episode and view outputs/explanations.
6. Testing: Pytest/Playwright + GitHub Actions (or equivalent).

1. Ingestion: FastAPI (Python) or Node/Express, JSON/FHIR parsing preferred.
2. Knowledge/Rules: YAML/JSON rules + a small evaluator; or use a rules engine (e.g., simple Python rule runner).
3. NLP/LLM:
• Baseline: keyword/regex + mapping tables.
• Practical: small open models (e.g., 7–13B local) for entity extraction & normalization; or hosted APIs if allowed.
• RAG: embed short rule snippets; never store proprietary content.
4. Storage: lightweight (SQLite/Postgres) for prompts, rules, and logs.
5. UI: minimal web UI to upload/select an episode and view outputs/explanations.
6. Testing: Pytest/Playwright + GitHub Actions (or equivalent).

Prize

• A$800 cash
• Full-time job opportunities

Join Us

As a participant