FreeLLMAPI icon

FreeLLMAPI

FreeLLMAPI is an OpenAI-compatible proxy for routing requests across free tiers of multiple LLM providers through one /v1 endpoint, with failover, encrypted keys, and an admin dashboard.

FreeLLMAPI

Overview

FreeLLMAPI is an OpenAI-compatible proxy for routing requests across the free tiers of multiple LLM providers behind a single /v1 endpoint. The project positions itself as a way to combine individual free plans into one shared inference surface for personal experimentation.

It supports a long list of provider integrations, plus any custom OpenAI-compatible endpoint such as llama.cpp, LM Studio, vLLM, or a local Ollama instance. The proxy handles model routing, automatic failover, encrypted storage for upstream keys, and a dashboard for managing keys and reviewing usage.

Features

Stacks multiple free providers

Aggregates the free tiers of providers including Google, Groq, Cerebras, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, HuggingFace, Z.ai, Ollama, Kilo, Pollinations, LLM7, OVH AI Endpoints, and OpenCode Zen behind one OpenAI-compatible surface.

Automatic routing and failover

Uses a router that selects a model for each request, falls back to the next provider when one is rate-limited, returns 429/5xx, or times out, and keeps short cooldowns for failed keys.

Per-key usage tracking

Tracks RPM, RPD, TPM, and TPD per provider, model, and key, and keeps sticky sessions on the same model for about 30 minutes during multi-turn conversations.

Encrypted keys and unified app access

Stores provider API keys encrypted with AES-256-GCM in SQLite, while clients authenticate to the proxy with a single unified bearer token.

OpenAI-compatible API surface

Exposes /v1/chat/completions, /v1/models, /v1/responses, /v1/embeddings, streaming, non-streaming, and OpenAI-style tool calling for compatible clients.

Built-in dashboard and analytics

Includes a React + Vite admin dashboard for managing keys, ordering fallback chains, viewing analytics, and running prompts in a playground.

Use Cases

  • Use a single API endpoint for LLM apps

    Point an OpenAI SDK, LangChain, LlamaIndex, Continue, or similar client at the proxy and keep the same application code while swapping the upstream path to /v1.

  • Spread requests across free tiers

    Add provider keys for several free-tier services and let the router choose an available model, then fail over automatically when one provider is throttled or unavailable.

  • Self-host a personal proxy stack

    Run the Docker Compose setup locally or on a small server to keep the API, dashboard, and SQLite data in one self-hosted environment.

  • Manage keys and monitor usage

    Use the admin dashboard to reorder fallback chains, inspect latency and token usage, and test prompts before wiring a client into the proxy.

  • Route to custom local or remote endpoints

    Connect a custom OpenAI-compatible backend such as LM Studio, llama.cpp, vLLM, or local Ollama through the same unified router.

Pros and Cons

Pros

  • Combines many free-tier providers behind one OpenAI-compatible endpoint.
  • Supports automatic fallback when a provider is rate-limited, errors, or times out.
  • Stores upstream keys encrypted at rest with AES-256-GCM.
  • Works with a broad set of OpenAI-compatible clients and SDKs by changing the base URL.
  • Includes a self-hosted dashboard for key management and analytics.

Cons

  • It is explicitly scoped to personal experimentation and a single-user setup, not multi-tenant team billing.
  • Several OpenAI API areas are not implemented, including image generation, audio, legacy completions, moderation, and n > 1 completions.

FAQ

What clients can use FreeLLMAPI?

FreeLLMAPI is designed to work with OpenAI-compatible clients. The README says you can point any OpenAI SDK or compatible client such as LangChain, LlamaIndex, Continue, or Hermes at the proxy by changing the base URL.

How is FreeLLMAPI typically deployed?

The Docker guide says Docker Compose is the recommended way to run it for personal use. It serves the API and dashboard from one process on port 3001, with SQLite persisted in a named volume.

Which OpenAI-style endpoints and workflows are supported?

The README says the proxy implements /v1/chat/completions, /v1/models, /v1/responses, /v1/embeddings, streaming and non-streaming responses, and OpenAI-style tool calling. It does not implement image generation, audio, legacy completions, moderation, multiple completions per request, or per-user billing.

Can teams use it with multi-tenant authentication?

The project is built around a single-user setup. The README explicitly says per-user billing and multi-tenant auth are not supported yet.

Quick Facts

Category
Developer Tool
Primary use
OpenAI-compatible LLM proxy
Deployment
Docker Compose or Node 20+ self-hosting
Auth model
Unified bearer token for apps; email/password admin login
Source domain
github.com
Pricing
Open source project; GitHub’s pricing page was reviewed for hosting context, but the product itself does not present a paid plan