Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmarks

Gateway overhead measured against a mock LLM server with instant responses — numbers reflect pure proxy cost.

Latency: P50 / P99 in milliseconds. Lower is better.

Chat Completions

RPSdirectcrabllmbifrostlitellm
1000.38 / 0.631.00 / 1.311.10 / 1.645.35 / 10.79
5000.28 / 0.420.66 / 1.070.36 / 0.91168.79 / 223.69
10000.15 / 0.310.44 / 0.830.27 / 0.46172.00 / 201.55
20000.17 / 0.330.29 / 0.880.29 / 0.53169.99 / 194.34
50000.13 / 0.330.26 / 0.570.26 / 0.48159.86 / 492.82

Streaming

RPSdirectcrabllmbifrostlitellm
1000.45 / 0.6243.53 / 48.141.51 / 2.20670.25 / 3357.70
5000.34 / 0.5442.90 / 47.140.51 / 0.93659.97 / 3569.92
10000.22 / 0.4244.18 / 48.300.45 / 0.98645.59 / 2797.66
200044.04 / 48.2344.25 / 48.5244.18 / 48.64596.90 / 2678.08
500044.04 / 48.2344.24 / 48.5044.20 / 48.66571.96 / 2563.73

Embeddings

RPSdirectcrabllmbifrostlitellm
1000.39 / 0.471.18 / 1.481.15 / 1.707.09 / 10.72
5000.30 / 0.420.78 / 1.150.43 / 1.03356.71 / 414.36
10000.17 / 0.270.51 / 0.910.38 / 0.85332.53 / 6516.44
20000.18 / 0.320.36 / 1.080.39 / 0.94317.53 / 365.68
50000.14 / 0.320.34 / 0.640.39 / 1.57305.91 / 8778.06

Memory (Peak RSS)

GatewayPeak RSS
direct15.3 MB
crabllm34.9 MB
bifrost171.7 MB
litellm541.8 MB