Architecture — Getting Usage Data to a Management Dashboard
Three ingestion modes, one privacy invariant.
The Stage-0 CLI reads local logs for one person. To serve a team/management dashboard we need a way to collect usage from many developers and apps. The developer asked to weigh two shapes: a cloud gateway (OpenRouter-style) vs. an "inner plugin" that transmits only token-usage outputs. Here's the analysis.
The non-negotiable: a privacy invariant
The whole product is "the trusted auditor." So the wire format carries only: provider · model · token counts (by type: input/output/cache-read/cache-write-5m/1h/reasoning) · business tags (feature/workflow/tier/customer/env) · timestamps · subscription allowance %. Never prompt or response content. This is enforceable, auditable, and it's the one line cloud gateways and SaaS FinOps tools cannot honestly draw.
Three deployment shapes (Phase B, shipped 2026-06-02)
| Mode | What runs where | When it fits |
|---|---|---|
| Self-host (sovereign) | Everything on customer infra — shipper + collector + dashboard via Docker. | Regulated industries, EU, anyone who can't share any usage data with a third party. |
| SaaS (default for most) | Shipper on customer infra (prompts NEVER leave); usage-only events POSTed to a Nomira-hosted collector + dashboard. Per-tenant API keys (nomira --create-tenant NAME). | Teams that want zero ops; 30-second onboarding. Same privacy guarantee as self-host — content never enters an event. |
| Log-load (paranoid) | Shipper exports events.json locally (nomira --export events.json); a human uploads that file at /import whenever they choose. No live network connection between machine and dashboard. | Air-gapped or "I'd rather batch-upload manually" teams. |
All three share the same wire format and the same privacy invariant: events carry counts + business tags only, schema-enforced, server-side rejected if content sneaks in. The mode is a transport choice, not a privacy choice.
Three ingestion modes (the building blocks under the deployment shapes)
Mode A — Local log readers (what exists today)
Read Claude Code / Codex transcripts on the machine. Zero integration, zero egress. Perfect for the individual wedge and the 5-person validation gate. Limit: only where rich local logs exist; one machine at a time.
Mode B — Usage-only collector / plugin ← recommended core
A thin hook that emits usage events only (the schema above) to a collector:
- App code: a wrapper/callback around the provider SDK that forwards
response.usage+ business tags. - Framework hooks: LangChain/LlamaIndex/Vercel AI SDK callbacks, or an OpenTelemetry exporter.
- Coding assistants: a background "usage shipper" that reuses our Mode-A adapters to tail transcript/rollout files and push usage-only events — no proxy, no content, no latency.
Pros: privacy-preserving by construction; off the critical path; multi-provider; works for both regimes (carries allowance for subscription tools). Cons: needs an integration point; coverage drift (mitigated by adapters + a "% of spend tagged" signal).
Mode C — Gateway (OpenRouter-style), optional + self-hostable
All traffic routes through a proxy that meters and forwards to providers. Pros: zero app changes; captures everything automatically; enables routing/budgets. Cons: sits on the critical path (latency + an availability dependency), and it sees full prompts/responses — only acceptable if self-hosted so content never leaves the customer's infra. Offer it for teams that prefer routing over instrumentation; never as a mandatory cloud middleman.
Comparison
| A: Local readers | B: Usage-only plugin | C: Gateway | |
|---|---|---|---|
| Integration effort | none | low (hook/SDK/OTel) | medium (reroute traffic) |
| Sees content? | no | no | yes (self-host to contain) |
| Critical path? | no | no | yes |
| Multi-provider | per-adapter | yes | yes |
| Team/management scale | no | yes | yes |
| Fits "trusted auditor" | yes | best | only self-hosted |
Recommendation
- Now: Mode A as the individual wedge (done — Claude Code + Codex).
- Team product: Mode B (usage-only collector) — it is the privacy/trust
- Optional: Mode C as a self-hosted gateway for teams who want zero code
differentiator, scales to management, and avoids becoming a fragile middleman. The coding-assistant "usage shipper" is just Mode A adapters running as a daemon that pushes the usage-only event upstream.
changes — clearly labelled with its content/critical-path tradeoffs.
The leverage point across all three is one normalized usage-event schema (already seeded by NormalizedUsage + the business dimensions). If that schema is adopted, it — not the code — is the standard.
How today's code maps
nomira/transcripts.py,nomira/codex.py= Mode-A adapters → become Mode-B shippers by adding an uploader.nomira/pricing.pyNormalizedUsage= the wire schema's usage core.- The collector + dashboard (Mode B server side) are the next build once the wedge validates.
Mode-B integration paths (today)
The team product accepts events from three complementary sources, all funneling into the same usage-only schema. Pick whichever fits where AI calls actually live:
| Path | Where it runs | What it instruments | Notes |
|---|---|---|---|
SDK adapters (nomira.integrations) | inside your app | LangChain callback (NomiraCallbackHandler), OpenAI client wrapper (wrap_client), Vercel AI SDK helper (track_ai_sdk_response / TS snippet) | Lazy imports — module loads without the optional framework installed; clear error only if you try to use it. Posts via nomira.track() → local SQLite or remote /ingest. |
Cursor Admin API (nomira.cursor_admin) | server-side fetch | per-team / per-user Cursor spend, normalized to events | The only real source of Cursor data — local Cursor logs carry no tokens. CLI: --cursor-fetch. |
Invoice reconciliation (nomira.reconciliation) | server-side fetch / import | Anthropic Cost API + CSV/JSON bill imports | The TRUE source of API-regime cost. Delta vs computed is the auditor's final answer. |
All three paths obey the privacy invariant: counts + tags only, never prompt or completion content. The nomira.events.assert_no_content guard is the final wall at the collector.