Whitepaper

Nomira Whitepaper — Cost Forensics & the Auditor Method

The auditor method — cache, regimes, attribution.

Abstract

AI coding-assistant spend is large, opaque, and unevenly distributed across developers and actions. Existing tools answer what happened in a call (observability) or what the org spent (FinOps), but not why a specific action or developer cost what it did. We show that (1) most of the cost lives in cache and reasoning tokens that naive calculators ignore, (2) accurate attribution requires application-/session-layer data that proxies and invoices cannot see, and (3) this data exists locally for some assistants today. Nomira is a privacy-preserving, multi-provider cost-forensics tool built on these facts.

1. The cost is not where you think

A single real Claude Code call we measured: input=6, output=119 visible tokens — but cache_read=22,830 and cache_write(1h)=28,134. A naive input·rate + output·rate calculation captures < 1% of the true cost of that call.

Aggregated over 44 local sessions: $10,231 true cost, of which ~91% is cache. A naive tool would report ~$970 — wrong by ~10×. Any tool that gets this wrong is worse than nothing for a finance audience, because it looks authoritative while being off by an order of magnitude.

2. The pricing model is multi-dimensional

Real per-call cost depends on, per provider and over time:

input vs output rates (output is typically 4–5× input),
cached read (Anthropic 0.10×; OpenAI 4o ~0.50×; GPT-5 ~0.10×; Gemini ~0.10× + hourly storage),
cache write premiums (Anthropic 1.25× for 5-min TTL, 2.0× for 1-hour TTL),
reasoning/thinking tokens (billed as output),
tool-use units (e.g. web search), batch (−50%), long-context surcharges.

Nomira models these explicitly in a versioned, per-provider, per-model rate table with per-model cache overrides. Unknown models are flagged and excluded — never priced from a guess.

Cost formula (per call)

cost = input·in_rate + output·out_rate
     + cache_read·in_rate·read_mult
     + cache_write_5m·in_rate·write_5m_mult
     + cache_write_1h·in_rate·write_1h_mult
     + tool_units·tool_rate

All token counts come from the provider's own usage object (Anthropic) or are recovered by diffing cumulative totals (Codex), so we never double-count.

3. Two billing regimes

Subscription (Claude Code, Codex, Cursor): fixed price + usage limits +

top-ups. The dollar is an API-equivalent shadow value — useful to judge "am I getting my plan's worth" and how much allowance was consumed.

API / token-billed: the dollar is the actual bill.

Conflating these misleads; Nomira labels which it reports and surfaces allowance for subscription tools.

4. Why attribution must live at the session layer

A proxy or an invoice sees model · tokens · key. It cannot know an action was "feature X, workflow Y, free-tier user, developer A" unless the application/session says so. The combinatorial key-per-dimension workaround does not scale. Therefore accurate business attribution requires session-layer data — which is exactly what local assistant transcripts provide.

5. Data availability (honest scope)

Claude Code: full per-call usage incl. cache split — rich forensics. ✓
Codex: cumulative token + reasoning + rate-limit (allowance) data. ✓
Cursor: meters tokens server-side; no local token data — forensics not

possible from disk; only the Cursor Admin/usage API could supply it. We report this rather than fabricate. ✗ (local)

6. Privacy as a correctness property

Nomira transmits/stores token counts + business tags only — never content. This is schema-enforced and test-verified. It is both an ethical stance and a market requirement: the auditor cannot be trusted if it exfiltrates the very data it audits.

7. Reconciliation (roadmap)

The gold standard is reconciling computed cost against the provider's real invoice/usage API. Nomira's design (versioned rates + a coverage/“% of spend tagged” signal) is built toward this; until reconciled, absolute dollars are "matched to published pricing," while relative breakdowns are already reliable.

8. Conclusion

The defensible product is not a dashboard; it is being right about cost when others are wrong, privately, across providers, for the coding-assistant workloads nobody else instruments. That is the auditor.

← PreviousArchitecture Next →Deploy