Edgee

What is Edgee?

Edgee is an edge-native AI gateway that sits between your application or coding agents and LLM providers. Its core job is to compress prompts before they reach model providers, with the goal of reducing token usage (and therefore lowering cost and latency) while preserving intent.

It exposes a single OpenAI-compatible API to route requests across 200+ models and adds an “edge intelligence” layer for routing policies, cost controls, private models, shared tools, and observability.

Key Features

Token compression for prompts: Reduces prompt size before requests are sent to LLM providers, targeting lower token counts for long contexts, RAG pipelines, and multi-turn agent runs.
OpenAI-compatible gateway API: Provides one API interface that can route traffic across 200+ models rather than requiring separate provider-specific integrations.
Transparent proxy mode for coding agents: Designed to work without code changes for agents, with compression applied starting on the first request.
Routing policies and cost controls: Adds edge-level controls for how requests are routed and how model usage is managed.
Tools at the edge: Supports invoking shared tools managed by Edgee and also deploying your own private tools closer to users and providers for tighter control and lower latency.
Bring Your Own Keys and/or Edge-managed keys: Lets you use Edgee’s keys for convenience or plug in your provider keys to maintain billing control and custom model configurations.
Observability: Tracks latency, errors, and usage including cost per model, per app, and per environment.
Private model deployment via serverless open-source LLMs: Deploys serverless open-source models on demand and exposes them through the same gateway API alongside public providers.

How to Use Edgee

Install the Edgee CLI: Run the installation command shown on the site (curl -fsSL https://install.edgee.ai | bash).
Connect Edgee to your agent or app: For coding agents, use the CLI/launch flow to connect Edgee as a transparent proxy so it can compress tokens without code changes.
Send requests through the gateway API: Your application or agent sends requests to Edgee using the OpenAI-compatible interface; Edgee applies token compression and any configured routing, tools, and controls.
Monitor results: Use Edgee’s observability to review latency, errors, and usage/cost by model, app, and environment.

Use Cases

Coding agents with repeated, long context: Use Edgee to compress prompts for coding assistants so multi-turn coding sessions and long-context interactions consume fewer tokens.
RAG pipelines: Place Edgee in front of your LLM calls in retrieval-augmented generation flows to reduce the token footprint of prompts containing retrieved context.
Applications using multiple LLM providers: Integrate once via the OpenAI-compatible gateway API and route requests across many models through Edgee instead of managing separate provider-specific logic.
Teams needing usage and cost visibility: Use observability to break down latency, errors, and usage/cost per model, per app, and per environment.
Deploying private models and custom tools: Expose serverless open-source LLMs and private tools through the same gateway API, keeping model and tool execution controlled at the edge.

FAQ

Is Edgee a proxy for existing agents?

Edgee is described as working as a transparent proxy for coding agents, with token compression enabled starting on the first request and without code changes required.

Does Edgee use an OpenAI-compatible API?

Yes. The site states that Edgee sits behind a single OpenAI-compatible API.

How does Edgee reduce costs?

Edgee reduces token usage by compressing prompts before they reach LLM providers, which the site links to lower bills and lower latency—especially for long contexts and multi-turn agents.

Can I use my own provider API keys?

The site says you can either use Edgee’s keys for convenience or plug in your own provider keys for billing control and custom models.

What can Edgee show in observability?

Edgee’s observability includes latency, errors, and usage/cost per model, per app, and per environment.

Alternatives

Provider-specific SDK integrations: Instead of routing through a gateway, integrate directly with one or more LLM providers. This can be simpler but typically requires separate handling for each provider and fewer shared controls across models.
RAG and prompt-optimization layers without a gateway: Tools that focus only on prompt construction, summarization, or truncation can reduce tokens, but they may not centralize routing policies, tool management, or multi-model observability.
Self-hosted proxy/gateway solutions: A custom or open-source gateway/proxy can centralize API compatibility and logging, but token compression, tool execution, and private model deployment would require additional implementation effort.