NOMIRA
Whitepaper

Nomira Whitepaper — Cost Forensics & the Auditor Method

The auditor method — cache, regimes, attribution.

Abstract

AI coding-assistant spend is large, opaque, and unevenly distributed across developers and actions. Existing tools answer what happened in a call (observability) or what the org spent (FinOps), but not why a specific action or developer cost what it did. We show that (1) most of the cost lives in cache and reasoning tokens that naive calculators ignore, (2) accurate attribution requires application-/session-layer data that proxies and invoices cannot see, and (3) this data exists locally for some assistants today. Nomira is a privacy-preserving, multi-provider cost-forensics tool built on these facts.

1. The cost is not where you think

A single real Claude Code call we measured: input=6, output=119 visible tokens — but cache_read=22,830 and cache_write(1h)=28,134. A naive input·rate + output·rate calculation captures < 1% of the true cost of that call.

Aggregated over 44 local sessions: $10,231 true cost, of which ~91% is cache. A naive tool would report ~$970 — wrong by ~10×. Any tool that gets this wrong is worse than nothing for a finance audience, because it looks authoritative while being off by an order of magnitude.

2. The pricing model is multi-dimensional

Real per-call cost depends on, per provider and over time:

Nomira models these explicitly in a versioned, per-provider, per-model rate table with per-model cache overrides. Unknown models are flagged and excluded — never priced from a guess.

Cost formula (per call)

cost = input·in_rate + output·out_rate
     + cache_read·in_rate·read_mult
     + cache_write_5m·in_rate·write_5m_mult
     + cache_write_1h·in_rate·write_1h_mult
     + tool_units·tool_rate

All token counts come from the provider's own usage object (Anthropic) or are recovered by diffing cumulative totals (Codex), so we never double-count.

3. Two billing regimes

4. Why attribution must live at the session layer

A proxy or an invoice sees model · tokens · key. It cannot know an action was "feature X, workflow Y, free-tier user, developer A" unless the application/session says so. The combinatorial key-per-dimension workaround does not scale. Therefore accurate business attribution requires session-layer data — which is exactly what local assistant transcripts provide.

5. Data availability (honest scope)

6. Privacy as a correctness property

Nomira transmits/stores token counts + business tags only — never content. This is schema-enforced and test-verified. It is both an ethical stance and a market requirement: the auditor cannot be trusted if it exfiltrates the very data it audits.

7. Reconciliation (roadmap)

The gold standard is reconciling computed cost against the provider's real invoice/usage API. Nomira's design (versioned rates + a coverage/“% of spend tagged” signal) is built toward this; until reconciled, absolute dollars are "matched to published pricing," while relative breakdowns are already reliable.

8. Conclusion

The defensible product is not a dashboard; it is being right about cost when others are wrong, privately, across providers, for the coding-assistant workloads nobody else instruments. That is the auditor.