Customer Health Triage

A live AI system that turns the messy signals on a B2B SaaS account — usage, support, sentiment, stakeholder moves, the renewal clock — into a decision-ready brief, and rolls a whole book of business into a prioritized worklist. Every account is also staged on the AI-adoption / Center-of-Excellence maturity curve.

Built for a CS leader's question, not a CSM's: where do I point the team's next 40 hours, and which accounts are ready to move up the AI-maturity curve. Runs on Claude (claude-sonnet-4-6) behind a server-side key — no setup for you, just click.

Scott Rouse · Scotch Creek Consulting checking environment…

#	Account	ARR	Renews	Verdict	Score	Exposure / upside	Maturity	Top driver

Click any row to open its full brief. Ranking weights churn risk, renewal urgency, and account value — the same triage a leader does by hand across a portfolio, in seconds.

Pick a sample account for a live Claude call, or compose your own. The model gets the raw signals and a method — not a checklist — and returns the structured brief below.

Compose your own account

Account name

Segment / industry

ARR

Days to renewal

Signals — one per line, optional "type: " prefix (usage / support / sentiment / stakeholder / commercial)

Messy, partial, contradictory input is welcome — that's the point.

Select an account, or compose one, to see its raw signals.

The health brief renders here.

The method (the system prompt the model runs)

The model gets one account as JSON and a method, not a checklist. It weighs signals by proximity to the renewal decision and the economic buyer, has to attach evidence to every driver, and is told to say so honestly when the signals are too thin for a confident call. Output is strict JSON, which is what lets it sit in a pipeline behind a worklist instead of living in a chat window.

How I'd run this in production — eval, calibration & guardrails

A confidence number with no calibration is decoration. Here's a small labeled check set — the model's verdict vs. a human label — plus how the system behaves when it isn't sure.

Account	Signal summary	Model	Human	Match

The "not sure" path is a feature, not a gap. When signal coverage is thin or contradictory, the model returns confidence: low and the brief flips to a provisional read that names the data to collect — it escalates to a human instead of inventing a confident verdict. Try it in Run it live with one or two vague signals (or open Sablefin Capital in the worklist).
Hard guardrail on the dangerous failure mode. The miss above is the model being too optimistic near a renewal. So any account combining a commercial threat (budget review / competitor) with a renewal inside 60 days is flagged for mandatory human review regardless of score — the score never silently greenlights a churn risk.
Production wiring: a 30–40 account golden set scored on every prompt change; accuracy + calibration tracked over time; per-brief cost, latency, and parse-failure rate logged (see the unit-economics readout on any live brief).
Abuse / governance: the Anthropic key stays server-side; calls are gated by Cloudflare Turnstile, a per-IP daily limit, and a hard global daily ceiling so the key can't be drained — the same "AI you can actually trust" posture an enterprise CS org has to run.

What's real, what's synthetic, and what changes with production data

Real: the Claude call, the method encoded in the system prompt, the structured output, the maturity model, the portfolio ranking, the eval/guardrail design, and the whole zero-setup deployment (Cloudflare Pages function holding the key, rate limits, Airtable write).
Synthetic: the accounts and signals. They're hand-built so the demo is self-contained and shareable — no customer data, no logins.
What changes with production data: signals come from the real source systems — a CS platform (Gainsight / Catalyst / Totango), the CRM (Salesforce), product telemetry, support (Zendesk), and the data warehouse (Snowflake). The model reads the same shaped record; the portfolio view becomes the team's live worklist; the maturity stage and value-realization metrics get tracked per account over time; and a human stays in the loop on every RED and every low-confidence read. At 10k accounts this runs as a nightly batch that surfaces the short list humans should actually touch — the point isn't to replace the CSM, it's to give a small team the working coverage of a much larger one.

How it's built & design notes

Plan → Design → Build → Deploy. The judgment lives in the prompt; the engineering makes it trustworthy and zero-friction. The browser never holds an API key — it calls a Cloudflare Pages function that holds the key server-side, verifies a Turnstile token, checks KV-based rate limits, calls Claude with the method above, and returns strict JSON. The page renders the same schema whether the brief is freshly generated or a cached sample, and the export writes that brief into Airtable as the system of record via an idempotent upsert. If any live call fails, the page falls back to a cached brief and says so — it never shows a broken state.

Single model on purpose: claude-sonnet-4-6 is fast and cheap enough to run across a whole book on a cadence, which is the right unit-economics call for high-volume triage — and being able to say why that's the right model beats reaching for the most expensive one. The same Anthropic models Airtable's own Field Agents run on.