I run Claude Code most days — for writing, for refactors, for the kind of grunt edits that used to live in a Jupyter notebook. The bill adds up. Over the last week I started running a plugin called caveman mode, and watched my output token usage drop by about three-quarters while the actual work stayed identical.
This post is what caveman actually is, how to install it from GitHub, what it cuts vs. what it preserves, and the two places where I switch it back off.
What "caveman mode" is
Caveman is a Claude Code plugin that rewrites the assistant's response style — not its reasoning. Articles (a/an/the), filler (just/really/basically/actually), pleasantries (sure/certainly/happy to), and hedging all get dropped. Fragments are allowed. Synonyms get shorter ("fix" instead of "implement a solution for"). The technical substance — file paths, function names, error strings, code blocks — stays exact.
It registers a SessionStart hook, so it activates the moment you open a session and persists across turns. Three intensity levels ship by default: lite, full, and ultra. A wenyan variant compresses further by leaning on Classical Chinese grammar.
Why the fluff is expensive
A typical Claude Code response wraps the answer in social glue: "Sure! I'd be happy to help. The issue you're experiencing is..." That intro is roughly fifteen tokens before any technical content appears. Multiply across hundreds of turns a week and you have a real line item on the bill.
The before/after the README quotes is roughly:
Not: "Sure! I'd be happy to help you with that. The issue
you're experiencing is likely caused by..."
Yes: "Bug in auth middleware. Token expiry check use `<`
not `<=`. Fix:"
Same information. ~75% fewer tokens. The model thinks the same thoughts; you just stop paying for the wrapping paper.
Install
Caveman ships as a Claude Code plugin and lives on GitHub. Install from a session prompt:
bash/plugin install caveman
Then any of these in-session:
bash/caveman # toggle on (full level)
/caveman lite # lighter compression
/caveman ultra # maximum compression
/caveman-stats # show real token savings from session log
/caveman-commit # write compressed conventional commit
/caveman-review # one-line PR review comments
Typing "stop caveman" or "normal mode" reverts. Level persists until the session ends.
What gets dropped, what stays
- Dropped: articles, hedging, pleasantries, filler intensifiers, trailing "let me know if…" sign-offs.
- Preserved: code blocks (untouched), error messages (quoted exact), file paths, function/class names, command names, URLs.
- Auto-disabled: security warnings, irreversible-action confirmations, multi-step sequences where fragment order risks misread.
That last bullet matters. Caveman quietly turns itself back on after the careful part is done. You do not have to babysit the toggle.
Real numbers from one week
I tracked usage via /caveman-stats, which reads the session log directly — no model estimation, no marketing math. A representative day on a mid-sized refactor:
sessions: 14
output tokens: ~118k (estimated baseline ~430k)
savings: ~73%
caveman level: full
That lines up with the README's headline of ~75% within rounding. Input tokens are unchanged — your prompts, file reads, and tool results are not compressed. Output tokens are where the win lives, and output is the more expensive direction on most pricing tiers.
Where it bites in the GitHub flow
Three places I let caveman drive the work end-to-end:
- /caveman-commit — generates Conventional Commit messages. Subject ≤50 chars, body only when the "why" is not obvious from the diff. No more "feat: add new functionality to handle the case where the user..."
- /caveman-review — PR review comments collapse to one line each: `path:line: severity: problem. fix.` Much easier to scan a 30-comment review.
- /ultrareview — multi-agent cloud review of the current branch. Not strictly caveman, but the output format is similarly compressed and pairs well.
Commits and PR descriptions land in normal prose — caveman is for conversation, not for artifacts another human reads out of context.
Where I switch it off
Two cases:
- Pair-debugging with someone watching my terminal. Fragment-style replies read as terse to another human in a way they do not read to me alone.
- Documentation drafts and onboarding writeups. Anything that becomes Markdown someone else reads cold needs the articles put back.
A communication style optimized for one reader is not free when there are two.
Verdict
Caveman pays for itself the first day. Output stays technically correct, the bill drops, the only adjustment is reading fragments instead of paragraphs in your own session. If you live in Claude Code, install it.
Next post: same measurement applied to /loop and the cron-based scheduling routines — to see if scheduled background work has the same headroom.