Nomira — cost forensics for AI coding assistants
What Nomira is, in one screen.
Install:pip install nomira· Self-host the team dashboard:docker compose up -d
· Full deployment guide: docs/DEPLOY.md
Stage 0 prototype. Answers a question no other tool does:
"Why did this conversation / turn cost so much?" — read straight from your
local Claude Code transcripts, with cache-aware (accurate) pricing.
Helicone, Langfuse and the FinOps tools all measure production app spend. Nobody explains coding-assistant spend. Nomira does — and shows you the part a naive token calculator misses: in real sessions ~90% of the cost is cache tokens, not the visible input/output.
Run it (no install, Python 3.9+)
python nomira.py # analyze your newest Claude Code session
python nomira.py --all # aggregate across every local transcript
python nomira.py --compare # efficiency comparison across sessions
python nomira.py --compare --by-project # ...grouped by project
python nomira.py --list # list transcripts it can see
python nomira.py path/to/session.jsonl --json
python nomira.py --regime api # force "this is the bill" framing
python nomira.py --source codex # analyze OpenAI Codex sessions (~/.codex/sessions)
python nomira.py --source codex --all
Reads Claude Code (~/.claude/projects) and OpenAI Codex (~/.codex/sessions) today. The cost engine is multi-provider (Anthropic verified; OpenAI/Codex + Gemini included), each with its own cache pricing. For Codex it also shows your subscription allowance used (e.g. "23% of your 5h window").
Team dashboard (self-hosted, usage-only)
python nomira.py --ship --developer alice # write usage-only events to ~/.nomira/usage.db
python nomira.py --serve # http://127.0.0.1:8787
Each teammate ships; the dashboard shows cost by developer, project, model, and trend. Only token counts + tags are stored — never prompt/response content (enforced by the schema; the ingest endpoint rejects content). See PRIVACY.md.
Two cost worlds
- Subscription (Claude Code, Codex, Cursor): fixed monthly + limits + top-ups. The dollar shown is the API-equivalent value of your usage — "am I getting my plan's worth", not a bill.
- API / token-based (Claude API, Gemini, OpenAI): the dollar shown is the bill.
Nomira labels which world it's showing. The cost engine is multi-provider (Anthropic exact today; OpenAI/Google scaffolded) with each provider's own cache pricing.
Reads ~/.claude/projects/<project>/<session>.jsonl. Nothing leaves your machine. No SDK, no proxy, no server.
What it tells you
- Total cost, split by component (input / output / cache-read / cache-write 5m & 1h / tools).
- The most expensive turns, and why each one was expensive.
- Waste signals: turns that kept rebuilding cache, repeated identical tool calls, repeated file reads.
Pricing (dynamic, multi-source)
Rates resolve through a source chain, best first:
- Local override —
~/.nomira/prices.local.json(you edit; closest to "true"). - OpenRouter — live per-token prices for 350+ models, cached locally
- Built-in table — published-rate fallback in
nomira/pricing.py.
(python nomira.py --update-prices, auto-refreshed daily).
Reports show which source priced the run. Matching is version-aware (claude-opus-4-7 → Opus 4.7, not an older or -fast variant). The true source — reconciliation against your real provider invoice — is the remaining goal.
Tests
python tests/test_cost_engine.py