Architecture

Architecture — Getting Usage Data to a Management Dashboard

Three ingestion modes, one privacy invariant.

The Stage-0 CLI reads local logs for one person. To serve a team/management dashboard we need a way to collect usage from many developers and apps. The developer asked to weigh two shapes: a cloud gateway (OpenRouter-style) vs. an "inner plugin" that transmits only token-usage outputs. Here's the analysis.

The non-negotiable: a privacy invariant

The whole product is "the trusted auditor." So the wire format carries only: provider · model · token counts (by type: input/output/cache-read/cache-write-5m/1h/reasoning) · business tags (feature/workflow/tier/customer/env) · timestamps · subscription allowance %. Never prompt or response content. This is enforceable, auditable, and it's the one line cloud gateways and SaaS FinOps tools cannot honestly draw.

Three deployment shapes (Phase B, shipped 2026-06-02)

Mode	What runs where	When it fits
Self-host (sovereign)	Everything on customer infra — shipper + collector + dashboard via Docker.	Regulated industries, EU, anyone who can't share any usage data with a third party.
SaaS (default for most)	Shipper on customer infra (prompts NEVER leave); usage-only events POSTed to a Nomira-hosted collector + dashboard. Per-tenant API keys (`nomira --create-tenant NAME`).	Teams that want zero ops; 30-second onboarding. Same privacy guarantee as self-host — content never enters an event.
Log-load (paranoid)	Shipper exports `events.json` locally (`nomira --export events.json`); a human uploads that file at `/import` whenever they choose. No live network connection between machine and dashboard.	Air-gapped or "I'd rather batch-upload manually" teams.

All three share the same wire format and the same privacy invariant: events carry counts + business tags only, schema-enforced, server-side rejected if content sneaks in. The mode is a transport choice, not a privacy choice.

Three ingestion modes (the building blocks under the deployment shapes)

Mode A — Local log readers (what exists today)

Read Claude Code / Codex transcripts on the machine. Zero integration, zero egress. Perfect for the individual wedge and the 5-person validation gate. Limit: only where rich local logs exist; one machine at a time.

Mode B — Usage-only collector / plugin ← recommended core

A thin hook that emits usage events only (the schema above) to a collector:

App code: a wrapper/callback around the provider SDK that forwards response.usage + business tags.
Framework hooks: LangChain/LlamaIndex/Vercel AI SDK callbacks, or an OpenTelemetry exporter.
Coding assistants: a background "usage shipper" that reuses our Mode-A adapters to tail transcript/rollout files and push usage-only events — no proxy, no content, no latency.

Pros: privacy-preserving by construction; off the critical path; multi-provider; works for both regimes (carries allowance for subscription tools). Cons: needs an integration point; coverage drift (mitigated by adapters + a "% of spend tagged" signal).

Mode C — Gateway (OpenRouter-style), optional + self-hostable

All traffic routes through a proxy that meters and forwards to providers. Pros: zero app changes; captures everything automatically; enables routing/budgets. Cons: sits on the critical path (latency + an availability dependency), and it sees full prompts/responses — only acceptable if self-hosted so content never leaves the customer's infra. Offer it for teams that prefer routing over instrumentation; never as a mandatory cloud middleman.

Comparison

	A: Local readers	B: Usage-only plugin	C: Gateway
Integration effort	none	low (hook/SDK/OTel)	medium (reroute traffic)
Sees content?	no	no	yes (self-host to contain)
Critical path?	no	no	yes
Multi-provider	per-adapter	yes	yes
Team/management scale	no	yes	yes
Fits "trusted auditor"	yes	best	only self-hosted

Recommendation

Now: Mode A as the individual wedge (done — Claude Code + Codex).
Team product: Mode B (usage-only collector) — it is the privacy/trust

differentiator, scales to management, and avoids becoming a fragile middleman. The coding-assistant "usage shipper" is just Mode A adapters running as a daemon that pushes the usage-only event upstream.

Optional: Mode C as a self-hosted gateway for teams who want zero code

changes — clearly labelled with its content/critical-path tradeoffs.

The leverage point across all three is one normalized usage-event schema (already seeded by NormalizedUsage + the business dimensions). If that schema is adopted, it — not the code — is the standard.

How today's code maps

nomira/transcripts.py, nomira/codex.py = Mode-A adapters → become Mode-B shippers by adding an uploader.
nomira/pricing.py NormalizedUsage = the wire schema's usage core.
The collector + dashboard (Mode B server side) are the next build once the wedge validates.

Mode-B integration paths (today)

The team product accepts events from three complementary sources, all funneling into the same usage-only schema. Pick whichever fits where AI calls actually live:

Path	Where it runs	What it instruments	Notes
SDK adapters (`nomira.integrations`)	inside your app	LangChain callback (`NomiraCallbackHandler`), OpenAI client wrapper (`wrap_client`), Vercel AI SDK helper (`track_ai_sdk_response` / TS snippet)	Lazy imports — module loads without the optional framework installed; clear error only if you try to use it. Posts via `nomira.track()` → local SQLite or remote `/ingest`.
Cursor Admin API (`nomira.cursor_admin`)	server-side fetch	per-team / per-user Cursor spend, normalized to events	The only real source of Cursor data — local Cursor logs carry no tokens. CLI: `--cursor-fetch`.
Invoice reconciliation (`nomira.reconciliation`)	server-side fetch / import	Anthropic Cost API + CSV/JSON bill imports	The TRUE source of API-regime cost. Delta vs computed is the auditor's final answer.

All three paths obey the privacy invariant: counts + tags only, never prompt or completion content. The nomira.events.assert_no_content guard is the final wall at the collector.

← PreviousUser guide Next →Whitepaper