Squeezr — AI Context Compression

Compatibility

Works with your tools

Auto-detects API format from request headers. Zero per-tool config.

Claude Code

Anthropic Messages API

OpenAI Codex

Chat Completions API

Aider

OpenAI-compatible

Gemini CLI

Google AI API

Ollama

Local inference

LM Studio

Local inference

Continue

VS Code & JetBrains

Soon

Cursor IDE

Coming soon

Soon

VS Code

Coming soon

Soon

Antigravity

Coming soon

Compression Gains

See the difference

Real compression results from actual coding sessions. Every byte counts.

0%avg. compressed

Layers

30+

Patterns

$$$

Saved

Test Outputvitest · 188 tests

2,340 chars198 chars

-92%

File Readserver.ts · 3200 lines

3,200 chars84 chars

-97%

Git Difffeature branch · 47 files

1,800 chars320 chars

-82%

System PromptClaude Code · 13KB

13,000 chars600 chars

-95%

Architecture

7-Layer Pipeline

Each request passes through seven independent stages. Each layer catches what the previous one missed.

01System Prompt

~13KB → 600 tokens

95%

02Read Dedup

Collapse duplicate reads

80%

03Noise Strip

ANSI, progress bars, spinners

30%

04Tool Patterns

30+ specific compressors

60%

05Line Dedup

Repeated lines & stacks

25%

06AI Compress

Haiku / GPT-mini / Flash

85%

07Session Cache

KV cache warming

90%

Features

Everything you need

Deterministic

30+ Patterns

Git diffs, test runners, build tools, Docker, Terraform, package managers — each has a dedicated compressor that knows exactly what to keep.

PASS src/config.test.ts (12 tests)

PASS src/cache.test.ts (8 tests)

FAIL src/server.test.ts (2 failed)

Smart

AI Fallback

When no pattern matches, Haiku, GPT-4o-mini, or Gemini Flash compress to under 150 tokens. The best model wins.

Haiku

120ms

GPT-mini

95ms

Flash

80ms

Dedup

File Dedup

Read the same file 5 times? Only the latest stays full. Earlier reads become lightweight references.

Cache

Session Cache

Identical compressed strings reuse API provider KV cache — up to 90% cost reduction on cache hits.

Lossless

Expand Tool

The AI can call squeezr_expand() to retrieve any original content. Nothing is permanently lost.

Simple

Zero Config

One install, one command, works immediately. Optional TOML config for fine-grained control.

Real Examples

See the compression

Before and after from real coding sessions. Click to toggle.

Beforevitest · 188 tests

vitest · 188 tests

✓ config (12) cache (8) expand (15)

✓ compressor (24) deterministic (89)

✗ server.test.ts (40 | 2 failed)

FAIL streaming — expected 500 to be 200

FAIL health — Cannot read undefined

1 failed | 5 passed · 2 failed | 186 passed

2,340 chars198 chars

How it works

Three steps. Thirty seconds.

From install to savings in under a minute. No configuration required.

Install & Setup

One npm install, one setup command. Auto-detects your OS, configures env vars, and starts the daemon.

terminal

$ npm i -g squeezr-ai

$ squeezr setup

✓ Done

Proxy Intercepts

Your AI tool sends requests through localhost. Squeezr intercepts transparently — no code changes needed.

proxy

→ POST /v1/messages

12,847 tokens input

Compressing...

Savings Begin

Compressed requests go to the API. Your AI gets all essential info with a fraction of the tokens.

stats

✓ 42 requests processed

✓ 34,291 tokens saved

✓ 78% average compression

Calculator

Estimate your savings

See how much you could save based on your usage.

Requests per session60

Avg tokens per request8K

AI Provider

Tokens saved / session

374,400

Tokens saved / month

24.7M

~3 sessions/day × 22 days

Cost saved / month

$74.13

Based on Claude (Sonnet) input pricing

Ready to compress?

Three commands. Thirty seconds. That's it.

terminal

$ ▌

Read the docs View on GitHub

MIT LicensedZero Config< 30s Setup