Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

0018 - Protocol

  • Feature Name: Wire Protocol
  • Start Date: 2026-03-27
  • Discussion: #18
  • Crates: core

Summary

A protobuf-based wire protocol defining all client-server communication for the crabtalk daemon, with a Server trait for dispatch and a Client trait for typed request methods.

Motivation

The daemon mediates between multiple clients (CLI, Telegram, web) and multiple agents. A well-defined wire protocol decouples client and server implementations and makes the contract explicit. Protobuf was chosen for compact binary encoding, language-neutral schema, and generated code via prost.

Design

Wire messages (crabtalk.proto)

Two top-level envelopes using oneof:

ClientMessage — 15 variants:

VariantPurpose
SendRun agent, return complete response
StreamRun agent, stream response events
PingKeepalive
SessionsList active sessions
KillClose a session
GetConfigRead daemon config
SetConfigReplace daemon config
ReloadHot-reload runtime
SubscribeEventsStream agent events
ReplyToAskAnswer a pending ask_user prompt
GetStatsDaemon stats
CreateCronCreate cron entry
DeleteCronDelete cron entry
ListCronsList cron entries
CompactCompact session history

ServerMessage — 11 variants:

VariantPurpose
ResponseComplete agent response
StreamStreaming event (see below)
ErrorError with code and message
PongKeepalive ack
SessionsSession list
ConfigConfig JSON
AgentEventAgent event (for subscriptions)
StatsDaemon stats
CronInfoCreated cron entry
CronListAll cron entries
CompactCompaction summary

Streaming events

StreamEvent is itself a oneof with 8 variants representing the lifecycle of a streamed agent response:

  • Start { agent, session } — stream opened.
  • Chunk { content } — text delta.
  • Thinking { content } — thinking/reasoning delta.
  • ToolStart { calls[] } — tool invocations beginning.
  • ToolResult { call_id, output, duration_ms } — single tool result.
  • ToolsComplete — all pending tool calls finished.
  • AskUser { questions[] } — agent needs user input.
  • End { agent, error } — stream closed (error is empty on success).

The client reads StreamEvents until it receives End, which is the terminal sentinel.

Agent events

AgentEventMsg carries a kind enum (TEXT_DELTA, THINKING_DELTA, TOOL_START, TOOL_RESULT, TOOLS_COMPLETE, DONE) plus agent name, session ID, content, and timestamp. Used by SubscribeEvents for live monitoring of all agent activity across sessions.

AgentEventMsg overlaps with StreamEvent — both represent the agent execution lifecycle. StreamEvent is the per-request streaming format (rich, typed variants). AgentEventMsg is the cross-session monitoring format (flat, single struct with a kind tag). The duplication exists because monitoring clients need a simpler, uniform shape to filter and display events from multiple agents.

Server trait

One async method per ClientMessage variant. Implementations receive typed request structs and return typed responses:

#![allow(unused)]
fn main() {
trait Server: Sync {
    fn send(&self, req: SendMsg) -> Future<Output = Result<SendResponse>>;
    fn stream(&self, req: StreamMsg) -> Stream<Item = Result<StreamEvent>>;
    fn ping(&self) -> Future<Output = Result<()>>;
    // ... one method per operation
}
}

The provided dispatch(&self, msg: ClientMessage) -> Stream<Item = ServerMessage> method routes a raw ClientMessage to the correct handler. Request-response operations yield exactly one ServerMessage; streaming operations yield many. Errors are mapped to ErrorMsg { code, message } using HTTP status codes with their standard semantics: 400 (bad request), 404 (not found), 500 (internal error).

Client trait

Two required transport primitives:

  • request(ClientMessage) -> Result<ServerMessage> — single round-trip.
  • request_stream(ClientMessage) -> Stream<Item = Result<ServerMessage>> — raw streaming read.

Typed provided methods (send, stream, ping, get_config, set_config) handle message construction, response unwrapping, and sentinel detection. The stream() method consumes events via take_while until StreamEnd and maps each frame through TryFrom<ServerMessage> for type-safe event extraction.

Conversions (message::convert)

From impls wrap typed messages into envelopes (SendMsg -> ClientMessage, SendResponse -> ServerMessage). TryFrom impls unwrap in the other direction, returning an error for unexpected variants. This keeps call sites clean — no manual enum construction.

Alternatives

JSON over WebSocket. Simpler to debug with curl, but larger payloads and no schema enforcement. Protobuf catches schema mismatches at compile time.

gRPC service definitions. Would provide streaming and code generation out of the box, but brings HTTP/2, tower middleware, and tonic as dependencies. The current approach is lighter: raw protobuf frames over a length-prefixed stream, with hand-written trait dispatch.

Separate request/response ID correlation. The protocol is connection-scoped and sequential — one outstanding request per connection at a time. This is a fundamental design constraint: clients must wait for a response before sending the next request. No need for request IDs or multiplexing. If multiplexing is needed later, it belongs in the transport layer, not the protocol.

Unresolved Questions

  • Should the protocol negotiate a version on connect to detect client/server mismatches?
  • Should StreamEnd carry structured error information (code + message) instead of a plain string?
  • Should there be a ClientMessage variant for subscribing to a specific session’s events rather than all events?