v1.17.1 · 219 tests · MIT

Compress your
AI context window

Local proxy that compresses tool outputs, deduplicates file reads, and strips noise. Save thousands of tokens per session with zero workflow changes.

0+
patterns
0
compression layers
0%
max compression
Compatibility

Works with your tools

Auto-detects API format from request headers. Zero per-tool config.

Claude Code
Anthropic Messages API
OpenAI Codex
Chat Completions API
Aider
OpenAI-compatible
Gemini CLI
Google AI API
Ollama
Local inference
LM Studio
Local inference
Continue
VS Code & JetBrains
Soon
Cursor IDE
Coming soon
Compression Gains

See the difference

Real compression results from actual coding sessions. Every byte counts.

0%avg. compressed
7
Layers
30+
Patterns
$$$
Saved
Test Outputvitest · 188 tests
2,340 chars198 chars
-92%
File Readserver.ts · 3200 lines
3,200 chars84 chars
-97%
Git Difffeature branch · 47 files
1,800 chars320 chars
-82%
System PromptClaude Code · 13KB
13,000 chars600 chars
-95%
Architecture

7-Layer Pipeline

Each request passes through seven independent stages. Each layer catches what the previous one missed.

01System Prompt

~13KB → 600 tokens

95%

02Read Dedup

Collapse duplicate reads

80%

03Noise Strip

ANSI, progress bars, spinners

30%

04Tool Patterns

30+ specific compressors

60%

05Line Dedup

Repeated lines & stacks

25%

06AI Compress

Haiku / GPT-mini / Flash

85%

07Session Cache

KV cache warming

90%
Features

Everything you need

Deterministic

30+ Patterns

Git diffs, test runners, build tools, Docker, Terraform, package managers — each has a dedicated compressor that knows exactly what to keep.

PASS src/config.test.ts (12 tests)
PASS src/cache.test.ts (8 tests)
FAIL src/server.test.ts (2 failed)
Smart

AI Fallback

When no pattern matches, Haiku, GPT-4o-mini, or Gemini Flash compress to under 150 tokens. The best model wins.

Haiku
120ms
GPT-mini
95ms
Flash
80ms
Dedup

File Dedup

Read the same file 5 times? Only the latest stays full. Earlier reads become lightweight references.

Cache

Session Cache

Identical compressed strings reuse API provider KV cache — up to 90% cost reduction on cache hits.

Lossless

Expand Tool

The AI can call squeezr_expand() to retrieve any original content. Nothing is permanently lost.

Simple

Zero Config

One install, one command, works immediately. Optional TOML config for fine-grained control.

Real Examples

See the compression

Before and after from real coding sessions. Click to toggle.

Beforevitest · 188 tests
vitest · 188 tests
✓ config (12) cache (8) expand (15)
✓ compressor (24) deterministic (89)
✗ server.test.ts (40 | 2 failed)
FAIL streaming — expected 500 to be 200
FAIL health — Cannot read undefined
1 failed | 5 passed · 2 failed | 186 passed
2,340 chars198 chars
How it works

Three steps. Thirty seconds.

From install to savings in under a minute. No configuration required.

01

Install & Setup

One npm install, one setup command. Auto-detects your OS, configures env vars, and starts the daemon.

terminal
$ npm i -g squeezr-ai
$ squeezr setup
✓ Done
02

Proxy Intercepts

Your AI tool sends requests through localhost. Squeezr intercepts transparently — no code changes needed.

proxy
→ POST /v1/messages
12,847 tokens input
Compressing...
03

Savings Begin

Compressed requests go to the API. Your AI gets all essential info with a fraction of the tokens.

stats
✓ 42 requests processed
✓ 34,291 tokens saved
✓ 78% average compression
Calculator

Estimate your savings

See how much you could save based on your usage.

60
8K
Tokens saved / session
374,400
Tokens saved / month
24.7M
~3 sessions/day × 22 days
Cost saved / month
$74.13
Based on Claude (Sonnet) input pricing

Ready to compress?

Three commands. Thirty seconds. That's it.

terminal
$
MIT LicensedZero Config< 30s Setup