NOMIRA
Architecture

Architecture — Getting Usage Data to a Management Dashboard

Three ingestion modes, one privacy invariant.

The Stage-0 CLI reads local logs for one person. To serve a team/management dashboard we need a way to collect usage from many developers and apps. The developer asked to weigh two shapes: a cloud gateway (OpenRouter-style) vs. an "inner plugin" that transmits only token-usage outputs. Here's the analysis.

The non-negotiable: a privacy invariant

The whole product is "the trusted auditor." So the wire format carries only: provider · model · token counts (by type: input/output/cache-read/cache-write-5m/1h/reasoning) · business tags (feature/workflow/tier/customer/env) · timestamps · subscription allowance %. Never prompt or response content. This is enforceable, auditable, and it's the one line cloud gateways and SaaS FinOps tools cannot honestly draw.

Three deployment shapes (Phase B, shipped 2026-06-02)

ModeWhat runs whereWhen it fits
Self-host (sovereign)Everything on customer infra — shipper + collector + dashboard via Docker.Regulated industries, EU, anyone who can't share any usage data with a third party.
SaaS (default for most)Shipper on customer infra (prompts NEVER leave); usage-only events POSTed to a Nomira-hosted collector + dashboard. Per-tenant API keys (nomira --create-tenant NAME).Teams that want zero ops; 30-second onboarding. Same privacy guarantee as self-host — content never enters an event.
Log-load (paranoid)Shipper exports events.json locally (nomira --export events.json); a human uploads that file at /import whenever they choose. No live network connection between machine and dashboard.Air-gapped or "I'd rather batch-upload manually" teams.

All three share the same wire format and the same privacy invariant: events carry counts + business tags only, schema-enforced, server-side rejected if content sneaks in. The mode is a transport choice, not a privacy choice.

Three ingestion modes (the building blocks under the deployment shapes)

Mode A — Local log readers (what exists today)

Read Claude Code / Codex transcripts on the machine. Zero integration, zero egress. Perfect for the individual wedge and the 5-person validation gate. Limit: only where rich local logs exist; one machine at a time.

Mode B — Usage-only collector / plugin ← recommended core

A thin hook that emits usage events only (the schema above) to a collector:

Mode C — Gateway (OpenRouter-style), optional + self-hostable

All traffic routes through a proxy that meters and forwards to providers. Pros: zero app changes; captures everything automatically; enables routing/budgets. Cons: sits on the critical path (latency + an availability dependency), and it sees full prompts/responses — only acceptable if self-hosted so content never leaves the customer's infra. Offer it for teams that prefer routing over instrumentation; never as a mandatory cloud middleman.

Comparison

A: Local readersB: Usage-only pluginC: Gateway
Integration effortnonelow (hook/SDK/OTel)medium (reroute traffic)
Sees content?nonoyes (self-host to contain)
Critical path?nonoyes
Multi-providerper-adapteryesyes
Team/management scalenoyesyes
Fits "trusted auditor"yesbestonly self-hosted

Recommendation

  1. Now: Mode A as the individual wedge (done — Claude Code + Codex).
  2. Team product: Mode B (usage-only collector) — it is the privacy/trust
  3. differentiator, scales to management, and avoids becoming a fragile middleman. The coding-assistant "usage shipper" is just Mode A adapters running as a daemon that pushes the usage-only event upstream.

  4. Optional: Mode C as a self-hosted gateway for teams who want zero code
  5. changes — clearly labelled with its content/critical-path tradeoffs.

The leverage point across all three is one normalized usage-event schema (already seeded by NormalizedUsage + the business dimensions). If that schema is adopted, it — not the code — is the standard.

How today's code maps

Mode-B integration paths (today)

The team product accepts events from three complementary sources, all funneling into the same usage-only schema. Pick whichever fits where AI calls actually live:

PathWhere it runsWhat it instrumentsNotes
SDK adapters (nomira.integrations)inside your appLangChain callback (NomiraCallbackHandler), OpenAI client wrapper (wrap_client), Vercel AI SDK helper (track_ai_sdk_response / TS snippet)Lazy imports — module loads without the optional framework installed; clear error only if you try to use it. Posts via nomira.track() → local SQLite or remote /ingest.
Cursor Admin API (nomira.cursor_admin)server-side fetchper-team / per-user Cursor spend, normalized to eventsThe only real source of Cursor data — local Cursor logs carry no tokens. CLI: --cursor-fetch.
Invoice reconciliation (nomira.reconciliation)server-side fetch / importAnthropic Cost API + CSV/JSON bill importsThe TRUE source of API-regime cost. Delta vs computed is the auditor's final answer.

All three paths obey the privacy invariant: counts + tags only, never prompt or completion content. The nomira.events.assert_no_content guard is the final wall at the collector.