Changelog
Changelog
What shipped in each round.
Session 1 — 2026-05-27/28
Research & alignment
- Extracted and digested the Nomira deck (11 slides) →
docs/references/nomira-deck.md. - Researched the LLM cost / AI-FinOps landscape; verified every deck claim →
docs/references/market-research.md. - Wrote deep problem analysis (
ANALYSIS.md), strategy (STRATEGY.md), draft spec, and independent conclusions (CONCLUSIONS.md) — all before reading the developer's doc. - Read developer's
Financial Aspects of Token Use.docx; wrote a structured comparison (COMPARISON.md). Independent convergence on the deck's flaws; adopted the developer's stronger forensics reframe.
Decision
- Pivot: from cost aggregation to cost forensics for AI coding assistants.
- Stage 0: a local CLI that reads Claude Code transcripts and explains where the money went. Focus order 1 → 3 → 2.
Built (Stage 0, focus 1 + 3)
nomira/package (stdlib-only): transcript loader, cache-aware pricing engine (the auditor core), per-turn forensics, waste signals, terminal + JSON report, CLI.- Verified on real data: a 679-call session = $217 (unverified Opus rates), 87% of it cache a naive tool would miss. Across all 44 local transcripts: ~$10,231, 91% cache (naive view would show ~$970 — off ~10×).
- Waste signals surface cache-rebuild-heavy turns.
- Regression tests (
tests/test_cost_engine.py) — 4/4 passing; caught an arithmetic slip in the expected values (the gate working).
Built (continued — multi-provider + regime + focus 2)
- Developer raised the two cost regimes: subscription (Claude Code/Codex/Cursor — fixed price + limits + top-up; dollar = API-equivalent shadow value) vs API/token-based (dollar = the bill). Modeled both; report now labels which regime it shows.
- Refactored the pricing engine into a generic provider registry with per-provider cache semantics (Anthropic exact; OpenAI cached-input 0.5×/no write cost; Google scaffold) + a
NormalizedUsageseam for future source/provider adapters. OpenAI path sanity-checked. - Focus 2 shipped:
--compare(by session) and--compare --by-project— efficiency table (cost, $/call, $/1k output, cache-reuse %). On local data:butt-dialpriciest project (~$5,008 API-equiv),agentsleast efficient per output. True dev-vs-dev deferred (needs cross-machine identity). - Tests still 4/4.
Built (continued — pricing verify, Codex adapter, architecture)
- (a) Pricing verified vs published rates: Anthropic confirmed (my originals were right), Gemini cache fixed to 0.10×, added GPT-4.1/o3/GPT-5-codex models with per-model cache overrides + per-model
verifiedflags. Report now flags only unverified models. - (b) Codex adapter shipped: reads
~/.codex/sessions, diffs cumulative token totals into per-call usage, captures subscription allowance (used_percent). Added aNormalizedUsageseam so forensics prices any provider. On real local data: 2 Codex sessions, ~$5.73 API-equivalent, 64% cached input, allowance 23% of the 5h window. Regime now follows the source (Codex → subscription). - Architecture design (ARCHITECTURE.md): local readers vs usage-only plugin vs gateway. Recommendation: usage-only plugin as the team core (privacy invariant: counts + tags, never content); gateway only if self-hosted.
- GTM/DD backlog added to TODO (branding, site, whitepaper, deck, product desc, user guide, DD-ready product) — what the 5-person gate actually needs.
- Tests now 6/6 (added Codex + provider-cache-semantics tests).
Built (continued — "do all": team product, Cursor, readiness package)
- Team product (Mode B):
store.py(SQLite, no content columns),events.py(usage-only schema + content-rejection guard),ship.py(local → events → store),collector.py(http.server/ingest+/api/summary+ server-rendered dashboard). CLI--ship/--serve/--developer/--db/--port. Shipped 34,949 real events ($10,300, 24 projects, both providers) and smoke-tested the server (content ingest rejected with 400). - Cursor adapter: investigated local data — Cursor stores no per-call token usage (0/98 conversations; server-side metering). Adapter reports availability honestly and points to the Cursor Admin API;
--source cursor. We do not fabricate costs. - Readiness package: LICENSE (MIT), PRIVACY.md, SECURITY.md, branding, product one-pager, user guide, whitepaper, investor deck, and a self-contained landing page (
site/index.html). HTML validated. - Tests now 9/9 (added
tests/test_team.py— privacy + store round-trip + idempotency).
Built (continued — dynamic pricing + server running)
- Dynamic multi-source pricing (
prices.py): live OpenRouter feed (356 models, cached 24h,--update-prices) → local override → published fallback. Per-call source provenance shown in reports ("rate source: openrouter:DATE").USE_DYNAMICtoggle keeps tests deterministic. - Version-aware matching:
claude-opus-4-7→ Opus 4.7 ($5/$25), not Opus 4 ($15/$75) or-fast($30/$150). Excludes fast/free/preview variants. - Idempotency fix: event keys exclude cost, so re-ship after a rate change won't duplicate. Re-shipped store: 34,872 events, $10,320 (market-sourced).
- Dashboard server running at http://127.0.0.1:8787 (verified 200).
- Tests 9/9 (pinned to published rates for determinism).
Built (continued — regime/plan correction)
- Subscription reframed as a plan, not a bill. Added
plans.py(Claude Code Max $200/mo default; configurable). Reports/dashboard now show subscription usage as API-equivalent VALUE vs your plan ("52× value"), explicitly not a bill; value-vs-plan shown only at period scope. Dashboard KPIs: API-equivalent value · Value vs $200/mo plan (52×) · Cache share (91%) · Developers.--plan-priceflag. - Dashboard project labels shortened to the folder name with the full path in the tooltip.
- Server restarted on :8787 with all changes; tests 9/9.
Built (continued — optimizer drill-down)
- Project drill-down shipped.
nomira/optimizer.py+/project/<slug>page. Click any project on the dashboard → KPIs (value, vs-plan, sub-agents, Bash), attributed cost by tool, most-repeated file reads with $ saving estimates, sub-agent usage, top sessions, and waste→savings recommendations. - Privacy preserved: optimizer data computed on demand from local transcripts; file paths / tool inputs never enter the team store (schema invariant intact).
- Concrete payoff on real data: found a file read 171× in butt-dial — a clear pin / .claudeignore optimization target.
- Tests 9/9; server live on :8787.
Built (continued — developer comparison + drill-down)
/compare/developers— efficiency leaderboard: sessions, projects, calls, cost, $/call, $/1k output, cache reuse %. Auto-headline "A spent N× more than B; least efficient X" when 2+ devs ship. Answers Pain-1./developer/<name>— drill-down: KPIs (value, share of team total, $/1k output, cache reuse), daily trend, cost by project (linked), cost by model.- Dashboard "Cost by developer" bars are clickable; card header has a
compare →link. - Per-developer share-of-value KPI on project pages already; ditto here. Tests 9/9.
Built (continued — six items in one round)
- Sharpened optimizer attribution (per-turn cache share across Reads in the same call; top files ranked by attributed cost, not raw frequency).
- Per-developer optimizer detail on
/developer/<name>, shown only when the dev matches this machine (privacy intact). - Invoice reconciliation —
billstable,--reconcile-import(CSV/JSON),--reconcile-fetch(Anthropic Cost API with admin key). Dashboard card +/reconciledaily comparison. Subscription excluded from the comparison (not a bill). - Time-window controls (today/7d/30d/all) on all aggregate views via
?period=. - Remote collector with bearer token (
NOMIRA_INGEST_TOKEN) +--hostfor LAN bind;ship.py --remote URL --token. - SDK
nomira.track()for the user's own app — local SQLite or remote POST, privacy invariant verified by test. - Tests 10/10 (added SDK test).
Built (continued — packaging & docs for public deploy)
pyproject.toml+__main__.py:pip install nomiragives anomiraconsole script.Dockerfile+docker-compose.yml+.dockerignore: one-command self-hosted team dashboard.- Optional dashboard auth (HTTP Basic via
NOMIRA_DASHBOARD_PASSWORD); startup warns if binding non-localhost without it. tools/build_docs.pyrenders everydocs/*.mdto a styled HTML page insite/docs/(stdlib only) — for non-engineers / DD reviewers. Landing page links to it.docs/DEPLOY.md: comprehensive deployment guide with onboarding scenarios, hosting recipes (Docker, Fly.io, Render, Cloud Run, bare VM), security posture, and a sizing note.
Built (continued — branded deck)
site/deck.html: 13-slide, self-contained, keyboard-navigable investor/sales deck. Brand-consistent dark+teal palette. Embeds real numbers from this build (cache gap, butt-dial drill-down, 52× plan leverage). Print-friendly. Linked from the marketing landing nav.- Note: the ChatGPT share link provided as additional context didn't expose its content to the fetcher (login shell); the deck was built from project material. Easy to revise once that review's content is pasted.
Built (continued — public web hosting)
.github/workflows/pages.yml: pushes tomainre-render docs and publish thesite/folder (landing + deck + docs) to GitHub Pages automatically.- DEPLOY.md gained a "Hosting the public site" section: GH Pages (recommended) · Netlify Drop / Vercel / Cloudflare Pages (drag-and-drop) · own Caddy server. Notes on custom domain + the marketing-vs-dashboard subdomain split.
- Verified site/ is fully self-contained — zero external CSS or JS references; drop on any host and it just works.
Session 2 — 2026-06-02
SDK framework adapters (nomira/integrations/)
- LangChain —
NomiraCallbackHandler(lazy import; safe to import without langchain installed).on_llm_end/on_chat_model_endextract token usage from the LLMResult / ChatResult and funnel intonomira.track()with constructor-supplied default tags. - OpenAI —
wrap_client(client, **tags)patcheschat.completions.createandresponses.createin place. Idempotent (rewrap is a no-op). Preserves return values and exceptions. Tolerant of pydantic v1/v2/dict usage shapes and of cached-input details. - Vercel AI SDK — both flavors: a
trackNomiraTypeScript helper (embedded as a string for the wiki) that POSTs a usage-only event to/ingestfromonFinish, plus a Pythontrack_ai_sdk_response(payload, **tags)for the AI SDK Python flavor. Maps camelCase (promptTokens,promptTokensDetails.cachedTokens) to the engine's normalized usage. - Tests:
tests/test_integrations.py(4/4) — module imports without optional deps;wrap_clientrecords a priced content-free event; the LangChain handler records via a stubbedBaseCallbackHandler(no langchain required); AI SDK helper handles camelCase. All forbidden content fields rejected. Pricing pinned to published rates. - Privacy invariant unchanged: only token counts + business tags reach the store.
Cursor Admin API adapter (nomira/cursor_admin.py)
- Server-side fetch into the same event store, via the documented endpoints (
POST /teams/filtered-usage-events,POST /teams/daily-usage-data). Basic auth (key as username, empty password). Defensive error handling: a failed endpoint leavesverified=Falseand lists the error rather than silently shipping an empty result. events_from_cursor()produces usage-only events: token-based rows get full token detail (input/output/cache-read/cache-write) plus the provider-charged cost as the source of truth; request-priced rows ship with the charged cost and zero tokens (clearly labeled). Cache-share cost split is heuristic, called out as UNVERIFIED — Cursor doesn't itemize per-component cost over the API.- Unknown models stay as
cursor/<raw>placeholders — never fabricate a price. Recognized models (Claude/GPT) map into the existing registry so they aggregate cleanly with Claude Code / Codex events. - CLI:
--cursor-fetch [--cursor-from YYYY-MM-DD] [--cursor-to YYYY-MM-DD]. Default window: last 30 days. cursor.py::availability_report()now also reports whetherNOMIRA_CURSOR_ADMIN_KEYis set and what command to run next.- Tests:
tests/test_cursor_admin.py(4/4) —urllib.request.urlopenmocked. Verifies Basic-auth header, epoch-ms date conversion, clean event shape with no content fields, request-priced vs token-based paths, and that 404s surface as errors instead of being swallowed.
Reconciliation polish (nomira/reconciliation.py, nomira/collector.py)
- Cost API parsing toughened:
_walk_resultsnow walks nestedresults/groups/data/items/rowscontainers depth-first, so newer Anthropic Cost API shapes (per-model breakdowns nested undergroups) parse without code changes._coerce_amountaccepts plain numbers,{amount, currency}, AND{value, currency}; non-USD rows are skipped, never silently converted. - Structured error reporting: HTTP errors include the response body (Anthropic's JSON error message is very informative), network errors are caught separately from JSON-decode errors, and the raised
RuntimeErrortext is now safe to surface in a UI. - Missing-day callout:
reconciliation.missing_days(pairs)classifies days intobilled_no_compute(provider charged us where we observed nothing — probably missing ingest) andcomputed_no_bill(observed API spend with no matching bill — probably an import gap). The/reconcilepage renders these as a warning card above the daily-comparison table, with a clear next step. - Verified
store.computed_by_dayalready returns a daily series and pairs cleanly even when one side is empty. - Tests:
tests/test_reconciliation.py(6/6) — CSV parser (incl. quoted-comma costs, mixed date formats, garbage rows), JSON parser (list + envelope), Cost API shape walker,_coerce_amount, missing-day classifier, full store round-trip.
Total tests: 24/24 (6 cost engine + 4 team + 4 integrations + 4 cursor admin + 6 reconciliation).
Session 4 — 2026-06-02 (hybrid MVP)
Shipped: multi-tenant store + API keys + log-load mode.
- Multi-tenant store.
tenant_idoneventsandbills; defensive ALTER TABLE for existing DBs; every aggregate scoped via_scope_where/_scope_and. Backward-compatible — self-host installs sit under tenantdefault. - API keys.
api_keystable, SHA-256 hash only (raw key shown once on create).--create-tenant NAMEmints;--list-tenantsshows registry. - Auth on
/ingest. Three layers: legacyNOMIRA_INGEST_TOKEN, per-tenant bearer keys, or open (localhost). Authenticated tenant is forced onto every event server-side. nomira --export events.json. Walks Claude Code + Codex sources, dumps usage-only events with the configured--tenanttag./importpage. Drag-and-drop upload form. The log-load mode: zero live connection between developer machine and dashboard.- Event idempotency key now includes tenant + developer.
- Tests: 26/26 (added tenant isolation + import-page check).
Built (continued — dashboard tenant context)
- Every dashboard route accepts
?tenant=<id>; queries scoped via the existing_scope_where/_scope_andhelpers. - Period picker, project bars, developer bars, "compare developers" links — all preserve
?tenant=X(single helper:_with_tenant). - Visible "viewing tenant X" chip when a non-default tenant is being viewed, with a "switch →" link back to default.
- Smoke-tested: cross-tenant isolation holds at the rendered-HTML layer (acme's developer-a never appears under globex view, and vice versa).
Open / next
- Per-tenant rate limits, signup/billing UI.
- Managed cloud Phase B — only on proven demand.