Run Claude Code on a 5GB Mac — RAM + Cost Optimised

Summary. Claude Code is brilliant but a Sonnet session can spike RAM to 8–10 GB and pin a CPU core. On older M1 Airs and 8 GB MacBooks, the rest of your apps slow to a crawl mid-session. Here are the concrete settings and workflow adjustments that keep Claude Code fast without degrading output quality: model tiering, context capping, CLAUDE.md pre-loading, and the /compact command you should be using on every long session.

Tools needed: Claude Code (CLI), an Anthropic API key, 15 minutes to configure.

The problem

Claude Code defaults to claude-sonnet-4-5. Each turn loads a context window that can grow to 200K tokens across a long session. In practice a busy refactor session — with file reads, edits, bash runs, and back-and-forth corrections — can push RAM to 3–6 GB just for the Claude Code process. Combined with VS Code (1.5–2 GB), a browser (1–2 GB), and Slack (0.5 GB), an 8 GB MacBook hits swap constantly and response latency doubles.

The good news: most tasks in a typical coding session do not need Sonnet. File reads, search-and-replace, log analysis, renaming symbols, and small single-function edits are all tasks where Haiku 3.5 performs nearly identically at one-tenth the compute cost and a fraction of the RAM footprint. You can tier your usage and get the same output with a machine that stays quiet.

Step-by-step configuration

Install Claude Code if you have not already:
```
npm install -g @anthropic-ai/claude-code
```
Then authenticate: claude and follow the API key prompt.
Set Haiku as your default model: Run claude /model and select claude-haiku-3-5. This takes effect for all future sessions. You can override per-session with /model claude-sonnet-4-5.
Create tiered shell aliases in ~/.zshrc:
```
alias cc="claude --model claude-haiku-3-5"
alias ccs="claude --model claude-sonnet-4-5"
alias ccop="claude --model claude-opus-4"
```
Use cc (Haiku) for everyday work. Use ccs (Sonnet) for multi-file refactors and architectural decisions. Use ccop (Opus) only for the hardest debugging sessions.
Cap the context window to avoid unnecessary RAM allocation. Add to ~/.claude/settings.json (create it if missing):
```
{
  "model": "claude-haiku-3-5",
  "maxTokens": 4096,
  "contextWindow": 80000
}
```
The default 200K context is overkill for most tasks. 80K covers a full source file plus conversation history. Halving this drops RAM by roughly 40%.
Write a CLAUDE.md in every project root. This is the highest-leverage optimisation on this list. Instead of explaining your stack, conventions, and banned patterns at the start of every session (burning 2–5K tokens), put it all in CLAUDE.md. Claude Code reads it automatically and uses it as persistent context without re-prompting. Example:
```
# Project: MyApp
Stack: Next.js 15, TypeScript strict, Drizzle ORM, Cloudflare Workers
Test runner: Vitest. Run tests with: pnpm test
Do not use class components. Prefer server components.
All DB queries go in src/db/queries/. Never query inline.
commit convention: feat/fix/chore(scope): description
```
A good CLAUDE.md reduces back-and-forth corrections by 30–50%, which directly reduces total tokens per session.
Use /compact aggressively. Once a session grows past 40–50 turns, type /compact. Claude Code summarises the session into a compressed context block and continues from the summary. This is the single biggest RAM relief valve on long sessions — it can drop active context from 80K to 8K tokens in one command. Get in the habit of running it after completing each logical chunk ("finished the auth refactor, now moving to the API layer").
Monitor your token burn in real time. Set ANTHROPIC_LOG=info in your shell environment:
```
export ANTHROPIC_LOG=info
```
Every response will show input/output token counts. You will quickly learn which operations are expensive (reading a full 1,500-line file costs ~3,000 input tokens) and adjust your prompts to read specific functions instead of entire files.
Set a hard API spend cap. In the Anthropic console (console.anthropic.com) under Billing, set a hard monthly cap. $10–20/month covers heavy daily Claude Code use on Haiku with Sonnet for hard tasks. Without a cap, a big debugging session on Opus can run $5–10 in a single afternoon.

Task routing guide — which model for which job

Haiku (cc): Reading and summarising files, search-and-replace, generating boilerplate from a template, writing docstrings, explaining what a function does, small single-function bug fixes, running and parsing bash output.
Sonnet (ccs): Multi-file refactors, debugging logic errors that span multiple modules, writing tests with edge case coverage, schema design, architecture questions, code reviews of complex diffs.
Opus (ccop): Hard algorithmic problems, debugging race conditions or intermittent failures, designing systems from scratch, writing production migration scripts. Use sparingly.

Expected outcome

After the configuration: idle RAM on the Claude Code process drops from ~3 GB to ~600–800 MB. Per-turn API cost drops from ~$0.08–0.12 (Sonnet) to ~$0.008–0.012 (Haiku). Monthly API spend drops from $80–150 to $15–30 for a typical daily use pattern. The MacBook fans stay quiet during file reads and small edits. Sonnet handles the 15–20% of tasks that genuinely need it, and /compact keeps long sessions from degrading into slow, expensive turns.

Gotchas

Haiku is noticeably worse at multi-step planning and complex debugging. If you find yourself correcting it more than twice in a row on the same problem, run /model claude-sonnet-4-5 mid-session rather than grinding through it with the wrong tool.
The contextWindow setting in settings.json limits how much of your file Claude Code will read in a single pass. If you work on large files regularly, bump it to 100K rather than capping at 80K.
CLAUDE.md is version-controlled alongside your code. This is intentional — every team member benefits from the same conventions being pre-loaded. Treat it as a living document and update it when your stack or conventions change.
ANTHROPIC_LOG=info outputs to stderr. Pipe stderr to a log file if you want to review it: claude 2>> ~/claude-tokens.log.
The /compact command can occasionally lose context about specific file states. After compacting, run a quick "summarise what we have done so far" prompt to verify the summary is accurate before continuing.

Time to set up: 15 min. Estimated savings: $40–80/mo on Anthropic API + significantly lower system resource usage during all-day coding sessions.