Methodology
How we extract structured facts from NZ commercial-motor policy wordings — and how we make every claim on this site auditable back to a source PDF, SHA-256 hash, and extraction date.
1. Source PDF acquisition
We ingest only insurer-published policy wordings. The current ingest pipeline covers the FCIB-confirmed commercial-motor panel — NZI, QBE, AIG, Zurich, Delta Insurance, Dual New Zealand. FCIB places these via the Insurance Advisernet NZ binding authority (Steadfast variants are the canonical wordings for binder-placed business).
Where the insurer's own CDN serves PDFs publicly with a browser user-agent (NZI direct), we fetch from source. Where the wording is broker-channel only, we use the FCIB-supplied PDF or a verified broker mirror (e.g. Bailey's Insurance for Steadfast variants). Every ingested PDF is SHA-256-hashed at fetch time and the hash recorded against every downstream fact for audit-trail.
2. PDF → Markdown transcription
Each PDF is converted to Markdown via one of two paths:
-
pdftotext(preferred for commercial-motor) — local PDF-to-text extraction. Fast (sub-second), free, deterministic. Default path for NZ commercial-broker PDFs because the Anthropic Haiku-vision PDF endpoint has timed out on the 700KB+ Steadfast wordings we've ingested. - Haiku-vision PDF endpoint (fallback) — when
pdftotextreturns insufficient text (image-only / scanned PDFs), Claude Haiku 4.5 transcribes the document via its vision API.
The Markdown twin is preserved verbatim and served at /llms-full.txt and per-product at /api/product/{insurer}/{product}/wording.md.
3. Structured-fact extraction (Sonnet 4.6)
Claude Sonnet 4.6 reads the Markdown wording and extracts structured facts against a Zod-validated schema covering 17 commercial-motor fact keys:
vehicle_value_basis— agreed-vs-market, total-loss basis, sum-insured indexationexcess_options_nzd— standard / at-fault / under-25 / unnamed-driver / theft / fleet-aggregateno_claims_bonus_schedule— NCB schedule or experience-modification modelclaims_basis— claims-made / occurrence / hybridterritorial_scope— NZ-only / cross-Tasman / shipping-in-transitexclusions— verbatim exclusion listhire_car— days after at-fault / not-at-fault / theft, daily capwindscreen_excess_nzd+glassdriver_schedule— any-driver vs named-drivers, age limitsvehicle_types_covered— utes / vans / fleet / courier / heavy / mobile-plantgoods_in_transit— included / sub-limitmobile_plant_coverhired_in_plantdrive_other_vehiclemodifications— disclosure rules, performance-mod exclusionnatural_disaster— storm / flood / earthquake / volcaniccover_tiers_in_wording
Every extracted fact is quoted verbatim from the wording where the schema field expects a string. Sonnet is instructed to OMIT any key it can't extract with high confidence — we never fabricate values.
4. Confidence-tier rubric
Each extraction is graded against the requiredForVerified field set for the vertical. For business-car the required set is vehicle_value_basis + excess_options_nzd + exclusions + claims_basis. If all required fields are populated, the extraction is tagged verified. Otherwise inferred or low.
Both wordings currently in the data layer (NZI Steadfast Commercial Motor + QBE Steadfast Commercial Motor) are tagged verified.
5. Audit trail per source PDF
Every page that renders a Supabase-backed fact carries a DataDisclaimer block exposing, per source document:
- Insurer + product name
- Wording effective date (from the document itself, not our ingestion date)
- Our extraction date (when we ran the ingest pipeline)
- SHA-256 hash of the source PDF
- Direct link to the source PDF
If a policy has been updated since our extraction, the verbatim text and structured facts may be out of date — the authoritative answer is always the source PDF linked, not our derived facts.
6. Snapshot architecture (build performance + trust)
The Astro build never queries Supabase. A snapshot script (scripts/snapshot-business-car.mjs) pulls all insurers + products + wordings + facts into src/data/business-car-snapshot.json + writes every /api/* endpoint to disk as static files. Subsequent builds + page renders read from JSON.
Cadence:
- On demand via
npm run snapshot:business-car - After every new wording ingest
- Monthly cron via GitHub Actions (pending CI workflow file push)
7. FMA fair-dealing posture
Personalised advice on which commercial-motor policy fits a specific business is regulated under the Financial Markets Conduct Act 2013. This site is operated by First Commercial Insurance Brokers Ltd (FSP748591), a Member Broker of Insurance Advisernet New Zealand Ltd, with adviser Stewart Hunt. FCIB advises on commercial-lines general insurance only.
Per /llms.txt: personal motor, home contents, life, trauma, income protection, health, and investment advice are out of FCIB's licensed scope and should be referred to an adviser licensed for those classes.
Contact FCIB for personalised advice: stewart@fcib.co.nz | 0800 437 699 | disclosure statement.
8. License + citation
All extracted facts published at /api/* are licensed CC BY 4.0 — attribute to https://businesscarinsurance.co.nz.
Cite as: <insurer> <product> wording extract, BusinessCarInsurance.co.nz, accessed <date>, source PDF <url>.
9. See also
- Sources index — every source PDF with effective dates + SHA-256 hashes
/llms.txt— AI-crawler bootstrap manifest/llms-full.txt— bulk Markdown bundle of every twin/api/insurers.json— machine-readable insurer panel index