Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Crabllm is a high-performance LLM API gateway written in Rust. It sits between your application and LLM providers, exposing an OpenAI-compatible API surface.

One API format. Many providers. Low overhead.

What It Does

You send requests in OpenAI format to crabllm. It routes them to the configured provider — OpenAI, Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, or Ollama — translating the request and response as needed.

Your application talks to one endpoint. Crabllm handles the rest:

  • Provider translation — Anthropic, Google, and Bedrock have their own API formats. Crabllm translates automatically.
  • Routing — Weighted random selection across multiple providers for the same model. Automatic fallback when a provider fails.
  • Streaming — SSE streaming proxied without buffering.
  • Auth — Virtual API keys with per-key model access control.
  • Extensions — Rate limiting, caching, cost tracking, budget enforcement.

Why Rust

  • Sub-millisecond overhead — no GC pauses, no interpreter startup.
  • Memory safety — without runtime cost.
  • Concurrency — Tokio async runtime handles thousands of concurrent streaming connections efficiently.
  • Deployment — single static binary. No interpreter, no virtualenv, no Docker required.

Feature Comparison

FeatureLiteLLMCrabllm
/chat/completionsyesyes
/embeddingsyesyes
/modelsyesyes
OpenAI provideryesyes
Anthropic provideryesyes
Google Gemini provideryesyes
Azure OpenAI provideryesyes
AWS Bedrock provideryesyes
Tool/function callingyesyes
SSE streamingyesyes
Virtual keys + authyesyes
Weighted routingyesyes
Model aliasingyesyes
Retry + fallbackyesyes
Rate limiting (RPM/TPM)yesyes
Cost/usage trackingyesyes
Budget enforcementyesyes
Request cachingyesyes
Image/audio endpointsyesyes
Storage (memory)yesyes
Storage (persistent)PostgresSQLite
Redis storageyesyes