Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

This is the crabtalk development book — the knowledge base you check before building. It captures what crabtalk stands for, how the system is shaped, and the design decisions that govern its evolution.

For user-facing documentation (installation, configuration, commands), see crabtalk.ai.

How this book is organized

  • Manifesto — What crabtalk is and what it stands for.
  • RFCs — Design decisions and features.

RFCs

Code tells you what the system does. Git history tells you when it changed. RFCs tell you why — the problem, the alternatives considered, and the reasoning behind the choice. When you’re about to build something new, RFCs are where you check whether the problem has been thought through before.

Not every change needs an RFC. Bug fixes, refactors, and small improvements go through normal pull requests. RFCs are for decisions that establish rules, contracts, or interfaces that others need to know about before building.

Format

Each RFC is a markdown file with the following structure:

  • Header — Feature name, start date, link to discussion, affected crates.
  • Summary — One paragraph describing the decision.
  • Motivation — What problem does this solve? What use cases does it enable?
  • Design — The technical design. Contracts, responsibilities, interfaces.
  • Alternatives — What else was considered and why it was rejected.
  • Unresolved Questions — Open questions for future work.

Lifecycle

  1. Open an issue on GitHub describing the feature or design problem.
  2. Implement it. Iterate through PRs until it’s merged.
  3. Once merged, write the RFC documenting the decision and add it to SUMMARY.md.

The RFC number is the issue number or the PR number that introduced the feature. RFCs are written after implementation, not before — they record decisions that were made, not proposals for decisions to come.

Manifesto

Ownership is necessary for an open agent ecosystem.

Ownership is not configuration. A configured agent is one where you picked from someone else’s menu. An owned agent is one where you decided what’s on the menu. Ownership is the power to compose your own stack.

Every agent application today rebuilds session management, command dispatch, and event streaming from scratch — then bundles it alongside search, browser automation, PDF parsing, TTS, image processing, and dozens of tools you didn’t ask for into one process. If you want a Telegram bot with search, you carry nineteen other channels and every integration. If you want a coding agent, you carry TTS and image generation. The process is theirs. The choices are theirs. You run it.

This happens because the daemon layer is missing. Without it, every application must become the daemon. And a daemon that is also an application ships its opinion of what your agent should be.

CrabTalk is that daemon layer. It manages sessions, dispatches commands, and streams the full execution lifecycle to your client. It does not bundle search. It does not bundle gateways. It does not bundle tools. You put what you need on your PATH. They connect as clients. They crash alone. They swap without restarts. The daemon never loads them.

An agent daemon is not an agent application. An agent daemon empowers you to build the application you want — and only the application you want. This is the essence of ownership.

We cannot expect agent platforms to give us ownership out of their beneficence. It is to their advantage to bundle, to lock in, to ship their choices as yours. We should expect that they will bundle. The only way to preserve choice is to never take it away in the first place.

We don’t much care if you prefer a batteries-included experience. You could build an OpenClaw-like assistant or a Hermes-like agent on top of CrabTalk. You can’t build a CrabTalk underneath them. The daemon must come first. The architecture must be right. Everything else follows.

Let us proceed.

Conversations

A conversation is the unit of agent interaction. It holds the message history an agent uses as working context, together with the state associated with that history.

Identity

A conversation is identified by the pair (agent, sender).

  • agent is the name of an agent configured in the daemon.
  • sender is a client-provided string identifying the counterparty. Clients choose their own convention, such as "user", "tg:12345", or "delegate:42".

The pair is the conversation’s only externally addressable name. The wire protocol carries no conversation identifier.

Lifetime

A conversation is created on first reference to a pair (agent, sender) that does not yet exist, and persists across daemon restarts. Persistence is delegated to the configured Storage backend.

At most one conversation exists for any given (agent, sender) pair.

Addressing

Protocol messages that operate on a conversation carry agent and sender fields. The pair resolves to the conversation on which the operation acts.

MessageEffect
StreamMsgAppend user content, run the agent, stream the response.
KillMsgCancel the in-flight run, if any.
CompactMsgCompact the current history into an archive (see Memory).
ReplyToAskSupply content for a pending ask_user call.

StreamMsg.sender is optional. When omitted, the daemon resolves a default sender determined by the transport.

State

A conversation holds:

  • History — an ordered sequence of history entries.
  • Title — a short human-readable label assigned by the set_title tool.
  • Working directory — the filesystem path used by OS-level tools during a run.
  • Archives — compacted prefixes of the history (see Memory).

History ordering is total. New entries are appended; no entry is reordered or removed except through compaction.

Working directory

Each conversation has a default working directory. StreamMsg.cwd, when set, overrides the default for the duration of the resulting run. The override does not modify the conversation’s default.

Message attribution

Each assistant message in the history carries an agent field.

  • An empty agent field denotes a message produced by the conversation’s primary agent, the one named by the conversation’s identity.
  • A non-empty agent field denotes a guest turn (see Multi-agent).

Messages produced by the daemon for protocol framing are marked as auto-injected and stripped from the history before each run.

Dispatch

The daemon accepts client messages on its transports and produces a stream of server messages in response. Each message is handled independently, with no central event loop mediating between the transport and the operations.

Entry point

Every transport (UDS, TCP, future additions) feeds ClientMessage values into the same dispatch callback. The callback spawns a Tokio task per message and polls the resulting stream, forwarding each ServerMessage back to the transport’s reply channel. When the stream ends or the reply channel closes, the task terminates.

Concurrency is unbounded at this layer: nothing throttles or serializes incoming messages before they reach their handler.

Dispatch function

Server::dispatch(ClientMessage) -> Stream<ServerMessage> is the single entry into the daemon’s operations. It inspects the ClientMessage variant and routes to the corresponding method on the Server trait.

  • Request-response operations (ping, kill_conversation, compact_conversation, administrative calls) yield exactly one ServerMessage.
  • Streaming operations (stream, subscribe_events) yield many ServerMessage values over time.
  • Unknown or empty messages yield a single error response.

The function is defined once in the core Server trait. Any implementor — the daemon, a test harness, a future alternative server — routes client messages the same way.

No central event loop

There is no serializing queue, no DaemonEvent enum, and no actor that owns mutation. Operations reach into shared state directly and hold locks for the duration of the critical section.

Shared state is protected by parking_lot::Mutex or parking_lot::RwLock. Event bus subscriptions, conversation working-directory overrides, pending ask_user replies, and cron state each live behind their own lock. Locks are acquired, the work is done, and the lock is released. Ordering between operations is whatever Tokio’s scheduler produces.

Ordering guarantees

Within a single conversation, message ordering is total: StreamMsg appends to history in the order the daemon receives them. Clients that require strict ordering for a conversation are responsible for serializing their own sends.

Between conversations, no ordering is guaranteed. Two StreamMsg values addressed to different (agent, sender) pairs may run in either order regardless of arrival time.

Cancellation

KillMsg cancels the in-flight run for its (agent, sender) pair. Cancellation propagates through the runtime to the active agent step, interrupting tool calls and LLM requests at the next await point. Already-emitted ServerMessage values are not retracted.

A cancelled conversation remains valid. The next StreamMsg for the same pair resumes against the history as it existed at the point of cancellation.

Event bus

The event bus is a subscription table, not a router. publish(source, payload) iterates subscriptions, invokes the fire callback for each match inline, and removes any subscription marked once. The callback fires under the bus’s lock; implementations must not reacquire it.

The bus has no queue and no scheduler. Fan-out is as fast as the callback runs for each matching subscription.

Multi-agent

Multi-agent conversations let a second agent speak into an existing conversation as a guest. A guest turn is a first-class message from the guest agent; it is not a tool call, a delegation, or a paraphrase.

Guest turns

A guest turn runs a named guest agent against the primary conversation’s history and appends the guest’s response to that history. The primary agent of the conversation is unchanged.

A guest turn is requested by setting StreamMsg.guest to the name of the guest agent. The conversation is still addressed by the primary’s (agent, sender) pair; guest selects who speaks on this turn, not whose conversation it is.

Flow

When StreamMsg { agent: A, sender: S, guest: G, content: C } is dispatched:

  1. The conversation (A, S) is resolved, creating it if necessary.
  2. The user content C is appended to the history.
  3. The daemon runs agent G against the history using G’s system prompt and instructions.
  4. The response is appended to the history, tagged with agent: G.

The primary agent is not invoked on a guest turn. A subsequent StreamMsg without guest resumes normal operation with the primary agent against the updated history.

Tools on guest turns

A guest turn is text-only. The guest agent’s tool schemas are not attached to the request, and any tool call emitted by the guest is rejected.

Tool-using work belongs to the primary agent. A guest is a voice in the conversation, not a worker.

Attribution

Each message in the history carries an agent field.

  • agent empty — the message originates from the conversation’s primary agent.
  • agent non-empty — the message originates from a guest. The value is the guest agent’s name.

Attribution survives compaction: archive entries preserve the agent field of each archived message.

Framing

When building a request, the runtime auto-injects framing messages that are not persisted between runs. Two framings exist:

  • Guest framing. Injected when a guest is running. It tells the guest that it is joining a conversation and explains the <from agent="..."> tag convention.
  • Primary framing. Injected when the primary is running and the history contains at least one message with a non-empty agent. It tells the primary that some messages are from guest agents and it should continue responding as itself.

Framing messages are marked auto-injected. They are stripped from the history at the start of each run and re-injected for that run only. The history on disk never contains framing messages.

Tagging

Assistant messages with a non-empty agent field are prefixed with <from agent="{name}"> when they appear in an LLM request. The prefix makes the speaker visible to whichever agent is currently reading the history.

A message without an agent field carries no prefix.

Cancellation

KillMsg addresses the conversation by (agent, sender). It cancels whichever run is in flight, whether that run is the primary or a guest. A cancelled guest turn leaves the user’s content appended to the history; the guest’s partial response is discarded.

Memory

Memory is a single-file entry store, shared by an agent across its conversations. It holds two kinds of content: notes that the agent writes deliberately, and archives that accumulate as conversations are compacted. Search is lexical (BM25); there are no embeddings.

Entries

An entry has:

  • id — monotonic integer, assigned on insert.
  • name — the entry’s primary identifier. Unique within the memory.
  • aliases — alternative names that resolve to the same entry.
  • content — the entry’s text.
  • kindNote or Archive.
  • created_at — creation timestamp.

Entries are addressed by name or by any of their aliases. A name is rebindable through aliasing; the canonical name is whatever the agent most recently chose.

Kinds

Note entries are the agent’s long-term store. The agent adds, renames, aliases, and rewrites them through memory operations.

Archive entries are produced by compaction. Their content is the summary of a compacted conversation prefix. Archive entries are not rewritten after creation.

Both kinds share the same index and search path. A search over memory returns both, ranked by relevance.

Compaction

Compaction compresses a prefix of a conversation’s history into a summary and records a boundary in the history at the point of compression.

When a conversation is compacted:

  1. The daemon summarizes the history prefix.
  2. The summary is written to the memory as an Archive entry with a generated name.
  3. A compact marker is appended to the conversation’s history, carrying the archive_name and archived_at timestamp.

On the next run, the history is replayed from the latest compact marker. Entries before the marker are dropped from the working context; the archive remains available through memory search and by explicit name.

A conversation can be compacted any number of times. Each compaction leaves one additional marker and one additional archive entry.

Persistence

The memory is a single file. The file holds all entries, all aliases, and the search index snapshot. A write operation mutates memory in RAM and writes an atomic snapshot of the file on each successful apply.

Opening an existing path reads the snapshot into RAM. Opening a non-existent path creates an empty memory; the file is written on the first successful apply.

Search is BM25 over the tokenized content and name of each entry. Results include the entry and its score. The caller chooses the cutoff — the store does not filter by relevance.

The token set is the union of tokens from content and name; aliases do not contribute tokens. Aliases are resolution, not search.

Operations

Memory exposes a closed set of write operations:

OperationEffect
AddCreate a new entry with a given name, content, and kind.
RenameChange an entry’s canonical name.
AliasBind an additional name to an existing entry.
WriteReplace an entry’s content.
RemoveDelete an entry and all its aliases.

Operations on Archive entries are permitted but not expected; the agent works with Note entries.

Runtime

The runtime is the engine that drives agents. It owns conversations in memory, runs agent steps, dispatches tool calls, and applies compaction. It does not open sockets, accept connections, or schedule time. Capabilities that require I/O are provided to the runtime by its environment.

Composition

A runtime is parameterized by a Config that names three associated types:

TypeResponsibility
StoragePersistence of conversations, skills, and memory.
ProviderLLM request and streaming.
EnvNode-specific capabilities and tool dispatch.

A binary supplies one Config. The daemon’s Config wires filesystem storage, a configured provider, and a node environment that owns hooks and event broadcasting. Tests supply a Config with in-memory storage, a stub provider, and () as the environment.

Responsibilities

The runtime handles:

  • Loading and saving conversations through Storage.
  • Building an agent request from the current history, instructions, and tool schemas.
  • Streaming responses from Provider and applying them to the conversation.
  • Dispatching tool calls through Env.
  • Emitting AgentEvent values for each step, tool call, and compaction.
  • Producing compaction summaries and appending archive markers.

Boundary

The runtime does not:

  • Bind listeners or accept transport connections.
  • Spawn tasks for message routing or scheduling.
  • Interpret protocol messages.
  • Read the system clock for scheduling purposes.
  • Manage process state such as PID files or signals.

These belong to the server that hosts the runtime.

Env

Env is the runtime’s only outward-facing capability surface. It provides:

  • hook() — the composite Hook that exposes tool schemas, dispatches tool calls, and participates in lifecycle events.
  • on_agent_event(agent, conversation_id, event) — hook point for side effects, such as event broadcasting or persistence of step traces.
  • subscribe_events() — optional subscription to a cross-conversation event stream, for servers that expose agent events to external clients.
  • discover_instructions(cwd) — collect instruction files applicable to a working directory.
  • effective_cwd(conversation_id) — resolve the working directory for a run, honoring any per-conversation override.

Methods that the runtime does not need in a given context have default implementations. An Env implementation may leave event broadcasting, instruction discovery, or CWD management at their defaults.

Hook

Hook is the single point through which the runtime reaches node-specific tools. A hook:

  • Advertises tool schemas for the LLM request.
  • Dispatches tool calls by name, returning a future that yields the tool’s result.
  • Participates in step lifecycle, observing starts, completions, and errors.

A hook is composite: the daemon’s hook owns sub-hooks (OS tools, ask_user, delegation, event subscription, memory). Order of sub-hooks is fixed by the composite; the runtime sees a single Hook.

Tool dispatch

A tool call from the agent carries the tool name, arguments, the originating agent and sender, and the conversation id. The runtime invokes Env::hook().dispatch(name, call). If no sub-hook claims the name, the dispatch yields an error result; the agent receives the error as the tool’s output.

Dispatch is asynchronous. The runtime awaits the tool future at the next step boundary and applies the result to the conversation before the following step.

Daemon

The daemon is the long-lived process that hosts the runtime, owns transports, and persists state. Clients are transient; the daemon is not. A single daemon process serves all configured agents, all active conversations, and all connected clients.

Responsibilities

The daemon owns:

  • Transports — UDS and TCP listeners. Listening endpoints belong to the daemon, not to individual clients or agents.
  • Runtime — a single shared runtime instance behind RwLock. Agents share the runtime; the runtime is never cloned per conversation.
  • Hooks — the composite Hook assembled from sub-hooks (OS tools, ask_user, delegation, event subscription, memory).
  • Event bus — subscription table and fire callback. File-backed by events/subscriptions.toml under the config directory.
  • MCP handler — connections to external MCP servers and routing to the tools they advertise.
  • Configuration — current DaemonConfig, reloaded in place on explicit reload.

The daemon does not interpret tool semantics. Tool dispatch is the runtime’s responsibility, routed through the composite hook.

Process model

The daemon runs as a single OS process. All work happens on a single Tokio runtime. There is one listener task per configured transport, one reply task per connected client, and one task per in-flight dispatch. Shutdown is initiated by a broadcast channel; every long-lived task subscribes and exits when the channel fires.

A daemon process owns at most one configuration directory and at most one set of transport endpoints.

Config directory

The daemon is rooted at a configuration directory supplied at startup. The directory holds:

PathContents
config.tomlNode configuration.
agents/Agent definitions.
sessions/Conversation JSONL logs, one file per conversation.
memory/Per-agent memory databases, one file per agent.
skills/Skill bundles loadable by agents.
events/subscriptions.tomlEvent subscription recovery file.

All paths are resolved relative to the configuration directory. The daemon writes nothing outside this directory.

Lifecycle

Startup. The daemon reads config.toml, constructs the provider, assembles hooks, opens storage, builds the shared runtime, loads event subscriptions from disk, binds transports, and begins accepting client messages.

Runtime. The daemon serves the Server trait. Each client message is dispatched into a spawned task that produces a stream of server messages.

Reload. A ReloadMsg causes the daemon to re-read config.toml and rebuild the shared runtime in place. Existing in-flight dispatches complete against the previous runtime; new dispatches see the reloaded runtime. Transports are not re-bound.

Shutdown. The daemon broadcasts a shutdown signal. Transport listeners stop accepting new connections. Active dispatches complete or cancel at the next await point. The daemon writes no final state on shutdown; state is persisted on each mutating operation, not at exit.

Persistence boundary

The daemon persists state through the Storage trait. Operations that mutate conversations, memory, or agent definitions write synchronously through storage before acknowledging the caller. Cron and event subscription files are written directly by the daemon.

A daemon restart recovers all state from the config directory. No state is held only in the process.

Client addressing

Clients do not address the daemon. Clients connect to a transport and send ClientMessage values. The transport’s reply channel delivers ServerMessage values back until the connection closes. A client that reconnects and addresses the same (agent, sender) pair resumes the same conversation; no client-side resume token is required.

Providers

Providers are the sole point of contact between the daemon and an LLM. The provider layer is external: its trait, types, and concrete implementations live upstream in crabllm. Crabtalk consumes providers but does not define them.

Boundary

The crabllm-core crate defines the Provider trait and the shared types that flow across it: ChatCompletionRequest, Message, Tool, ToolCall, Role, Usage, ApiError. These types are the contract between crabtalk and any LLM backend.

The crabllm-provider crate defines concrete provider implementations. ProviderRegistry assembles them and yields one Provider value constructed from the node configuration.

Crabtalk depends on both crates as external dependencies. It does not vendor provider code. Changes to provider internals — authentication, request formatting, streaming, error decoding, retry policy — are made upstream.

Usage

A runtime is parameterized by Config::Provider. The daemon’s default config resolves Provider by calling ProviderRegistry::build with the user’s configuration. The runtime holds a single provider instance for its lifetime and calls it once per agent step.

The provider is asked to produce:

  • A non-streaming completion for synchronous operations.
  • A streaming completion for StreamMsg operations, yielding chunks that the runtime accumulates into a Message.

The runtime does not interpret provider-specific errors. ApiError is surfaced to the client as a protocol error; the provider is responsible for mapping backend failures into ApiError values.

Tools across the boundary

Tool schemas are declared in crabllm-core::Tool. The runtime collects schemas from the composite hook, attaches them to the request, and lets the provider format them for the backend. Tool calls returned by the provider arrive as ToolCall values; the runtime dispatches each call through Env::hook().dispatch.

The shape of tool schemas is fixed by crabllm-core. A tool that cannot be expressed in that shape is not expressible to crabtalk.

Configuration

Provider configuration is read from the node’s config.toml and passed to ProviderRegistry. The daemon does not inspect provider-specific configuration; it forwards the relevant sections to the registry and accepts the resulting Provider.

Adding a new backend is a change to crabllm-provider. It is not a change to crabtalk.

Upstream

crabllm is maintained at crabtalk/crabllm. Bug fixes, new backends, and trait changes are filed there. Crabtalk upgrades its crabllm dependency on release.

0009 - Transport

  • Feature Name: UDS and TCP Transport Layers
  • Start Date: 2026-03-27
  • Discussion: #9
  • Crates: transport, core

Summary

A transport layer providing Unix domain socket (UDS) and TCP connectivity between clients and the crabtalk daemon, built on a shared length-prefixed protobuf codec defined in core.

Motivation

The daemon needs to accept connections from local CLI clients and remote clients (Telegram, web gateways). UDS is the natural choice for same-machine communication — no port management, filesystem-based access control. TCP is required for remote access and cross-platform support (Windows has no UDS).

Both transports share identical framing and message types. The codec and message definitions belong in core so that any transport can use them without depending on each other. The transport crate provides the concrete connection machinery.

Design

Codec (core::protocol::codec)

Wire format: [u32 BE length][protobuf payload]. The length prefix counts payload bytes only, excluding the 4-byte header itself.

Two generic async functions operate over any AsyncRead/AsyncWrite:

  • write_message<W, T: Message>(writer, msg) — encode, length-prefix, flush.
  • read_message<R, T: Message + Default>(reader) — read length, read payload, decode.

Maximum frame size is 16 MiB. Frames exceeding this limit produce a FrameError::TooLarge. EOF during the length read produces FrameError::ConnectionClosed (clean disconnect, not an error).

Server accept loop

Both UDS and TCP servers share the same pattern:

accept_loop(listener, on_message, shutdown)
  • listenerUnixListener or TcpListener.
  • on_message: Fn(ClientMessage, Sender<ServerMessage>) — called for each decoded client message. The sender is per-connection; the callback can send multiple ServerMessages (streaming responses) or exactly one (request-response). The channel is unbounded because messages are small and flow-controlled by the protocol — the agent produces responses at LLM speed, far slower than socket drain speed.
  • shutdownoneshot::Receiver<()> for graceful stop.

Each accepted connection spawns two tasks: a read loop that decodes ClientMessages and calls on_message, and a send task that drains the UnboundedSender and writes ServerMessages back. When the read loop ends (EOF or error), the sender is dropped, which terminates the send task.

TCP specifics

  • Default port: 6688. If the port is in use, bind fails — another daemon may already be running.
  • TCP_NODELAY is set on all connections (low-latency interactive protocol).
  • bind() returns a std::net::TcpListener (non-blocking).

UDS specifics

  • Unix-only (#[cfg(unix)]).
  • Socket path is caller-provided (typically ~/.crabtalk/daemon.sock).
  • No port management or collision handling — the filesystem path is the identity.

Client trait (core::protocol::api::Client)

Two required transport primitives:

  • request(ClientMessage) -> Result<ServerMessage> — single round-trip.
  • request_stream(ClientMessage) -> Stream<Item = Result<ServerMessage>> — send one message, read responses until the stream ends.

Both UDS Connection and TCP TcpConnection implement Client identically: split the socket into owned read/write halves, write via codec, read via codec. The request_stream implementation reads indefinitely; typed provided methods on Client (e.g., stream()) handle sentinel detection (StreamEnd).

Connections are not Clone — one connection per session. The client struct (CrabtalkClient / TcpClient) holds config and produces connections on demand.

Alternatives

tokio-util LengthDelimitedCodec. Would save the manual length-prefix code but adds a dependency for ~50 lines of straightforward framing. The hand-rolled codec is simpler to audit and has no extra allocations.

gRPC / tonic. Full RPC framework with HTTP/2 transport. Heavyweight for a local daemon protocol. The current design is simpler: raw protobuf over a length-prefixed stream, no HTTP layer, no service definitions beyond the Server trait.

Shared generic transport trait. UDS and TCP accept loops are nearly identical but kept as separate modules. A generic Transport trait would save ~20 lines of duplication but add an abstraction with exactly two implementors. Not worth it.

Unresolved Questions

  • Should the transport support TLS for TCP connections in non-localhost deployments?
  • Should there be a connection timeout or keepalive at the transport level, or is the protocol-level Ping/Pong sufficient?

0018 - Protocol

  • Feature Name: Wire Protocol
  • Start Date: 2026-03-27
  • Discussion: #18
  • Crates: core

Summary

A protobuf-based wire protocol defining all client-server communication for the crabtalk daemon, with a Server trait for dispatch and a Client trait for typed request methods.

Motivation

The daemon mediates between multiple clients (CLI, Telegram, web) and multiple agents. A well-defined wire protocol decouples client and server implementations and makes the contract explicit. Protobuf was chosen for compact binary encoding, language-neutral schema, and generated code via prost.

Design

Wire messages (crabtalk.proto)

Two top-level envelopes using oneof:

ClientMessage — 15 variants:

VariantPurpose
SendRun agent, return complete response
StreamRun agent, stream response events
PingKeepalive
SessionsList active sessions
KillClose a session
GetConfigRead daemon config
SetConfigReplace daemon config
ReloadHot-reload runtime
SubscribeEventsStream agent events
ReplyToAskAnswer a pending ask_user prompt
GetStatsDaemon stats
CreateCronCreate cron entry
DeleteCronDelete cron entry
ListCronsList cron entries
CompactCompact session history

ServerMessage — 11 variants:

VariantPurpose
ResponseComplete agent response
StreamStreaming event (see below)
ErrorError with code and message
PongKeepalive ack
SessionsSession list
ConfigConfig JSON
AgentEventAgent event (for subscriptions)
StatsDaemon stats
CronInfoCreated cron entry
CronListAll cron entries
CompactCompaction summary

Streaming events

StreamEvent is itself a oneof with 8 variants representing the lifecycle of a streamed agent response:

  • Start { agent, session } — stream opened.
  • Chunk { content } — text delta.
  • Thinking { content } — thinking/reasoning delta.
  • ToolStart { calls[] } — tool invocations beginning.
  • ToolResult { call_id, output, duration_ms, is_error } — single tool result. is_error signals the handler reported failure; output carries the text in either case so clients can render it. UIs use the flag to style errors distinctly; agents can use it for retry decisions without string-matching on error messages.
  • ToolsComplete — all pending tool calls finished.
  • AskUser { questions[] } — agent needs user input.
  • End { agent, error } — stream closed (error is empty on success).

The client reads StreamEvents until it receives End, which is the terminal sentinel.

Tool result ordering. When a single agent step produces N tool calls, the runtime dispatches them concurrently and emits ToolResult events in completion order — fast tools are reported as soon as they finish, slow siblings report later. The event stream is therefore not ordered by the call index in ToolStart.calls[]. Clients correlate by call_id, which is the primary key; do not assume positional alignment with the ToolStart call list.

Agent events

AgentEventMsg carries a kind enum (TEXT_DELTA, THINKING_DELTA, TOOL_START, TOOL_RESULT, TOOLS_COMPLETE, DONE) plus agent name, session ID, content, and timestamp. Used by SubscribeEvents for live monitoring of all agent activity across sessions. For TOOL_RESULT events, the tool_is_error field mirrors the streaming protocol’s is_error — monitoring clients use it to aggregate error rates per tool type without parsing output strings.

AgentEventMsg overlaps with StreamEvent — both represent the agent execution lifecycle. StreamEvent is the per-request streaming format (rich, typed variants). AgentEventMsg is the cross-session monitoring format (flat, single struct with a kind tag). The duplication exists because monitoring clients need a simpler, uniform shape to filter and display events from multiple agents.

Server trait

One async method per ClientMessage variant. Implementations receive typed request structs and return typed responses:

#![allow(unused)]
fn main() {
trait Server: Sync {
    fn send(&self, req: SendMsg) -> Future<Output = Result<SendResponse>>;
    fn stream(&self, req: StreamMsg) -> Stream<Item = Result<StreamEvent>>;
    fn ping(&self) -> Future<Output = Result<()>>;
    // ... one method per operation
}
}

The provided dispatch(&self, msg: ClientMessage) -> Stream<Item = ServerMessage> method routes a raw ClientMessage to the correct handler. Request-response operations yield exactly one ServerMessage; streaming operations yield many. Errors are mapped to ErrorMsg { code, message } using HTTP status codes with their standard semantics: 400 (bad request), 404 (not found), 500 (internal error).

Client trait

Two required transport primitives:

  • request(ClientMessage) -> Result<ServerMessage> — single round-trip.
  • request_stream(ClientMessage) -> Stream<Item = Result<ServerMessage>> — raw streaming read.

Typed provided methods (send, stream, ping, get_config, set_config) handle message construction, response unwrapping, and sentinel detection. The stream() method consumes events via take_while until StreamEnd and maps each frame through TryFrom<ServerMessage> for type-safe event extraction.

Conversions (message::convert)

From impls wrap typed messages into envelopes (SendMsg -> ClientMessage, SendResponse -> ServerMessage). TryFrom impls unwrap in the other direction, returning an error for unexpected variants. This keeps call sites clean — no manual enum construction.

Alternatives

JSON over WebSocket. Simpler to debug with curl, but larger payloads and no schema enforcement. Protobuf catches schema mismatches at compile time.

gRPC service definitions. Would provide streaming and code generation out of the box, but brings HTTP/2, tower middleware, and tonic as dependencies. The current approach is lighter: raw protobuf frames over a length-prefixed stream, with hand-written trait dispatch.

Separate request/response ID correlation. The protocol is connection-scoped and sequential — one outstanding request per connection at a time. This is a fundamental design constraint: clients must wait for a response before sending the next request. No need for request IDs or multiplexing. If multiplexing is needed later, it belongs in the transport layer, not the protocol.

Unresolved Questions

  • Should the protocol negotiate a version on connect to detect client/server mismatches?
  • Should StreamEnd carry structured error information (code + message) instead of a plain string?
  • Should there be a ClientMessage variant for subscribing to a specific session’s events rather than all events?

0027 - Model

  • Feature Name: Model Abstraction Layer
  • Start Date: 2026-01-25
  • Discussion: #27
  • Crates: model, core

Summary

A provider registry that wraps multiple LLM backends (OpenAI, Anthropic, Google, Bedrock, Azure) behind a unified Model trait, with per-model provider instances, runtime model switching, and retry logic with exponential backoff.

Motivation

The daemon talks to LLMs. Which LLM, from which provider, through which API — that’s configuration, not architecture. The agent code should call model.send() and not care whether it’s hitting Anthropic directly or an OpenAI-compatible proxy.

This requires:

  • A single trait that all providers implement.
  • A registry that maps model names to provider instances.
  • Runtime switching between models without restarting.
  • Retry logic for transient failures (rate limits, timeouts).
  • Type conversion between crabtalk’s message types and each provider’s wire format.

Design

Model trait (core)

Defined in wcore::model:

#![allow(unused)]
fn main() {
pub trait Model: Clone + Send + Sync {
    async fn send(&self, request: &Request) -> Result<Response>;
    fn stream(&self, request: Request) -> impl Stream<Item = Result<StreamChunk>>;
    fn context_limit(&self, model: &str) -> usize;
}
}

The trait is in core because agents are generic over Model. The implementation lives in the model crate.

Provider

Wraps crabllm_provider::Provider (the external multi-backend LLM library) behind the Model trait. Each Provider instance is bound to a specific model name and carries:

  • The backend connection (OpenAI, Anthropic, Google, Bedrock, Azure).
  • A shared HTTP client.
  • Retry config: max_retries (default 2) and timeout (default 30s).

Base URL normalization strips endpoint suffixes (/chat/completions, /messages) so both bare origins and full paths work in config.

ProviderRegistry

Implements Model by routing requests to the correct provider based on the model name in the request.

ProviderRegistry
├── providers: BTreeMap<String, Provider>   # keyed by model name
├── active: String                          # default model
└── client: reqwest::Client                 # shared across providers
  • Construction: one ProviderDef can list multiple model names. Each gets its own Provider instance. Duplicate model names across definitions are rejected at validation time.
  • Routing: send() and stream() look up the provider by request.model. Callers get a clone of the provider — the registry lock is not held during LLM calls.
  • Switching: switch(model) changes the active default. Agents can still override per-request via the model field.
  • Hot add/remove: providers can be added or removed at runtime without rebuilding the registry.

Retry logic

Non-streaming send() retries transient errors (rate limits, timeouts) with exponential backoff and full jitter:

  • Initial backoff: 100ms, doubling each retry.
  • Jitter: random duration in [backoff/2, backoff].
  • Max retries: configurable per provider (default 2).
  • Non-transient errors (auth failures, invalid requests) fail immediately.

Streaming does not retry — the connection is already established.

Type conversion

A convert module translates between wcore::model types (Request, Response, Message, StreamChunk) and crabllm_core types (ChatCompletionRequest, ChatCompletionResponse). This isolates the external library’s types from the rest of the codebase.

Alternatives

Direct provider calls without a registry. Each agent holds its own provider. Rejected because runtime model switching and centralized configuration require a shared registry.

Trait objects instead of enum dispatch. Box<dyn Model> instead of the concrete Provider enum. Rejected because Model has generic return types (impl Stream) that prevent object safety. The enum dispatch via crabllm_provider::Provider handles this naturally.

Unresolved Questions

  • Should the registry support fallback chains (try provider A, fall back to B)?
  • Should streaming requests retry on connection failures before the first chunk?

0036 - Skill Loading

  • Feature Name: Skill Loading
  • Start Date: 2026-03-27
  • Discussion: #36
  • Crates: runtime

Summary

How crabtalk discovers, loads, dispatches, hot-reloads, and scopes skills. The skill format follows the agentskills.io convention — this RFC covers the loading mechanism, not the format.

Motivation

Agents need extensible behavior without recompilation. Skills are the simplest unit that works: a markdown file with a name, description, and a prompt body. No code generation, no plugin API, no runtime linking.

The format is defined by agentskills.io. What crabtalk needs to decide is how skills are found on disk, how they’re resolved at runtime, how they stay current without restarts, and how agents are restricted to subsets of available skills.

Design

Format

SKILL.md follows the agentskills.io convention. Required fields: name, description. Optional: allowed-tools. The markdown body is the skill prompt.

Discovery

SkillHandler::load(dirs) scans a list of directories (in config-defined order) recursively for SKILL.md files. Each skill lives in its own directory:

skills/
  check-feeds/
    SKILL.md
  summarize/
    SKILL.md

Nested organization is supported (skills/category/my-skill/SKILL.md). Hidden directories (.-prefixed) are skipped. Duplicate names across directories are detected and skipped with a warning — first-loaded wins, in config-defined directory order.

Registry

A Vec<Skill> wrapped in Mutex inside SkillHandler. Linear scan — the registry is small enough that indexing is unnecessary. Supports add, upsert (replace by name), contains, and skills (list all).

Dispatch

Exposed as a tool the agent can call. Input: { name: string }.

Resolution order:

  1. Scope check — if the agent has a skill scope and the name is not in it, reject.
  2. Path traversal guard — reject names containing .., /, or \.
  3. Exact load from disk — for each skill directory, check {dir}/{name}/SKILL.md. If found, parse it, upsert into the registry, return the body.
  4. Fuzzy fallback — if no exact match, substring search the registry by name and description. If input is empty, list all available skills (respecting scope).

Hot reload

The upsert on exact load (step 3) is the hot-reload mechanism. When a skill is invoked, it’s always loaded fresh from disk and upserted into the registry. Skills can be updated on disk and picked up on next invocation without daemon restart.

Slash command resolution

Before a message reaches the agent, preprocess resolves leading /skill-name commands. For each skill directory, it checks {dir}/{name}/SKILL.md. If found, the skill body is wrapped in a <skill> tag and injected into the message. This happens before tool dispatch — it’s prompt injection, not a tool call.

Scoping

Agents can be restricted to a subset of skills via AgentScope.skills. If non-empty, only listed skills are available. Empty means unrestricted. Scoping applies to both exact load, fuzzy listing, and slash resolution.

Alternatives

Code-based plugins (dylib / WASM). Far more powerful but far more complex. Skills are prompt injection, not code execution. The simplicity of markdown files is the point.

Database-backed registry. Adds persistence complexity for a registry that rebuilds in milliseconds from disk. Not needed.

Unresolved Questions

  • Should skills support arguments beyond the skill name (parameterized prompts)?
  • Should allowed-tools be enforced at the runtime level? Currently it is not enforced — it exists in the format but has no runtime effect.

0043 - Component System

  • Feature Name: Component System
  • Start Date: 2026-02-15
  • Discussion: #43
  • Crates: command

Summary

Crabtalk components are independent binaries that install as system services and connect to the daemon via auto-discovery. They crash alone, swap without restarts, and the daemon never loads them. This is the manifesto’s composition model made concrete.

Motivation

The manifesto says: “You put what you need on your PATH. They connect as clients. They crash alone. They swap without restarts.”

This requires a system where components — search, gateways, tool servers — are not subprocesses of the daemon. They’re independent programs that run as system services. The daemon discovers them at runtime. A broken component cannot take the daemon down.

Other projects spawn MCP servers as child processes. If the child hangs or crashes, it can take the daemon with it: zombie processes, leaked file descriptors, blocked event loops. The subprocess model creates shared fate. The component model eliminates it.

Design

The contract

A component is a binary that:

  1. Installs itself as a system service (launchd, systemd, or schtasks).
  2. Writes a port file to ~/.crabtalk/run/{name}.port on startup.
  3. Serves an HTTP API (MCP protocol) on that port.

The daemon scans ~/.crabtalk/run/*.port at startup and discovers components automatically. No configuration needed — drop a component on PATH, install it, and the daemon finds it.

Service trait

#![allow(unused)]
fn main() {
pub trait Service {
    fn name(&self) -> &str;        // "search"
    fn description(&self) -> &str; // human readable
    fn label(&self) -> &str;       // "ai.crabtalk.search"
}
}

The trait provides default start, stop, and logs methods:

  • start — renders a platform-specific service template, installs and launches.
  • stop — uninstalls the service and removes the port file.
  • logs — tails ~/.crabtalk/logs/{name}.log.

MCP service

Components that expose tools to agents extend McpService:

#![allow(unused)]
fn main() {
pub trait McpService: Service {
    fn router(&self) -> axum::Router;
}
}

run_mcp binds a TCP listener on 127.0.0.1:0, writes the port to the run directory, and serves the router. The daemon discovers it on next scan.

Platform support

Service templates are platform-specific:

  • macOS — launchd plist (~/Library/LaunchAgents/)
  • Linux — systemd user unit
  • Windows — schtasks with XML task definition

Auto-discovery

The daemon scans ~/.crabtalk/run/*.port for port files not already connected. Each file contains a port number. The daemon connects via http://127.0.0.1:{port}/mcp. No subprocess management, no shared fate.

Crash? The daemon doesn’t care — it was never the component’s parent process. Restart? New port file, the daemon picks it up on next reload. Update a component? Install the new version, restart the service — the daemon sees the new port on next scan.

Entry point

The run() function handles tracing init and tokio bootstrap for all component binaries.

Alternatives

Subprocess management. The daemon spawns and manages components as child processes. Rejected because shared fate — a broken child can break the daemon. This is the approach we explicitly designed against.

Docker / containerization. Run components in containers. Rejected because crabtalk is local-first. System services are the right abstraction for a personal daemon on your machine.

Shell scripts for service management. Works on Unix, breaks on Windows, drifts across components. A shared Rust crate is portable and stays consistent.

Unresolved Questions

  • Should the Service trait support health checks?
  • Should the daemon watch the run directory for new port files instead of scanning only at startup/reload?

0075 - Hook

  • Feature Name: Hook Lifecycle
  • Start Date: 2026-03-15
  • Discussion: #75
  • Crates: core, runtime, daemon

Updated by 0162 (Hook-as-plugin) and 0189 (Policy at the Edge). Hooks now own their tools (per-hook schema() + dispatch()) rather than registering through a shared ToolRegistry. on_before_run was removed and replaced by on_register_agent / on_unregister_agent for state tracking. preprocess returns Option<String> (None = pass through).

Summary

The Hook trait is the central extensibility point for agent lifecycle. Each subsystem (skills, memory, MCP, scoping, OS tools) implements Hook to provide schemas, dispatch tool calls, contribute system-prompt fragments, observe events, preprocess messages, and track per-agent state. The runtime composes hooks behind a single facade and never reaches into a subsystem directly.

Motivation

When the runtime was split out of the daemon (#75), a clean interface was needed between the runtime (which executes agents) and the hook implementations (which customize them). The runtime must not know about skills, memory, MCP, or daemon infrastructure. It only knows it has a Hook and calls its methods at the right times.

This separation enables two modes: the daemon (full hook with skills, MCP, memory, event broadcasting) and embedded use (no hook, or a minimal one).

Design

The trait

#![allow(unused)]
fn main() {
pub trait Hook: Send + Sync {
    fn schema(&self) -> Vec<Tool> { vec![] }
    fn system_prompt(&self) -> Option<String> { None }
    fn on_build_agent(&self, config: AgentConfig) -> AgentConfig { config }
    fn on_register_agent(&self, name: &str, config: &AgentConfig) {}
    fn on_unregister_agent(&self, name: &str) {}
    fn on_event(&self, agent: &str, conversation_id: u64, event: &AgentEvent) {}
    fn preprocess(&self, agent: &str, content: &str) -> Option<String> { None }
    fn scoped_tools(&self, config: &AgentConfig) -> (Vec<String>, Option<String>);
    fn dispatch<'a>(&'a self, name: &'a str, call: ToolDispatch) -> Option<ToolFuture<'a>> { None }
}
}

All methods have default no-op implementations. () implements Hook.

Lifecycle points

schema — the tools this hook owns. The composite hook unions every sub-hook’s schema() to expose the runtime-wide tool set. There is no shared ToolRegistry — each hook is the source of truth for its tools.

system_prompt — optional fragment appended to agent system prompts at build time. Used by hooks that always inject standing instructions (e.g. memory’s behavioural guidance).

on_build_agent — called when an agent is registered. Receives the agent config, returns a possibly-modified config. The composite implementation chains: environment block (OS, shell, platform), per-hook system_prompt() fragments, resource hints (available MCP servers, available skills), and a <scope> block when the agent restricts its tools/skills.

on_register_agent / on_unregister_agent — called when an agent is added to or removed from the runtime registry. Hooks that track per-agent state (scopes, descriptions, MCP fingerprint refcounts) record and clean up here. Symmetric: by the time Runtime::agent() returns the new agent, hook state is in place; by the time the agent is invisible, hook state has been dropped.

preprocess — called before a user message enters the conversation. Returns Some(modified) to transform, None to pass through. Slash-command resolution (/skill-name args → wrapped <skill> body) lives here.

scoped_tools — given an agent config, returns the subset of this hook’s tools the agent may call, plus an optional <scope> prompt line. Default: include every tool from schema() with no scope line. Hooks override to gate inclusion on AgentConfig fields (e.g. memory only when enabled, skill tool only when the agent has a skills list).

dispatch — called when an agent issues a tool call. Returns Some(future) if this hook owns the tool name, None otherwise. The composite walks hooks in order and dispatches to the first owner.

on_event — called after each agent step. Receives every AgentEvent (text deltas, tool calls, completions). DaemonHook uses this to broadcast events to subscribers.

Composition

DaemonHook is the daemon’s composite hook. It holds a map of named sub-hooks (skill, memory, mcp, os, delegate, ask_user) and orchestrates them: schema() unions, dispatch() walks the registered owners, on_build_agent chains the system-prompt fragments, on_register_agent/on_unregister_agent fan out, on_event broadcasts.

For embedded use, () implements Hook as a full no-op so the runtime works without any subsystems.

Tool dispatch

Dispatch is part of the Hook trait. When an agent produces a tool call, the runtime walks the composite hook and calls dispatch(name, call) until one returns Some(future). Each hook owns the tools it declared via schema(); nothing else can claim them. Scope enforcement happens at the composite layer before walking sub-hooks.

Dispatch returns Result<String, String>. Ok carries normal tool output; Err carries a handler-reported failure (invalid args, not found, scope rejection, operation error) or a dispatch-level failure (no tool sender, tool channel closed, reply dropped). The same convention applies to server-specific tools owned by the daemon (ask_user, delegate). The distinction propagates to the AgentEvent::ToolResult.output field and to the wire protocol’s is_error flag so UIs can render errors distinctly and agents can make retry decisions without string-matching error messages. HistoryEntry::tool still stores the inner string regardless of the arm — the LLM wire format has no is_error field, so the model sees the text either way.

When an agent step produces multiple tool calls, the runtime dispatches them concurrently via FuturesUnordered; tool results are appended to history in the original call order (positional pairing with tool_calls is load-bearing for providers that correlate by index), but the ToolResult events fire in completion order so UIs show fast tools immediately without waiting on slow siblings.

Alternatives

Separate traits per concern. One trait for prompt building, one for tools, one for events. Rejected because they always compose together and the single trait is simpler to implement and reason about.

Closure-based hooks. Pass lambdas instead of a trait. Rejected because the hook needs shared state (skill registry, MCP connections, memory) that closures make awkward.

Unresolved Questions

  • Should on_build_agent be async to support hooks that need I/O during agent construction?
  • Should preprocess support returning multiple messages (e.g. for multi-skill invocation)?

0080 - Cron

  • Feature Name: Cron Scheduler
  • Start Date: 2026-03-20
  • Discussion: #80, #183
  • Crates: apps/cron

Summary

Cron triggers skills into agents on a schedule. The scheduler runs as a standalone service outside the daemon and speaks the existing StreamMsg protocol — no cron-specific daemon knowledge, no cron-specific wire messages. The apps/cron crate is desktop-oriented; alternate consumers (e.g. multi-tenant cloud schedulers) model their own entry shape, storage, and time-zone semantics — the shared surface between them is just the daemon’s StreamMsg protocol.

Motivation

Agents need periodic behavior — checking feeds, running maintenance, sending reminders. Time-based triggering is one form of trigger; chat messages, webhooks, and file-watch events are others. All of them produce the same shape: something happens → an agent runs with a payload. Cron is the first concrete implementation of this trigger role and deliberately uses the same StreamMsg path that chat gateways use.

The session already carries the agent and sender. A cron entry needs the skill to fire, the agent to run, the sender to attribute it to, and the schedule expression — nothing else.

Design

Data model

[[cron]]
id = 1
schedule = "0 */2 * * * *"
skill = "check-feeds"
agent = "crab"
sender = "cron"
quiet_start = "23:00"
quiet_end = "07:00"
once = false
  • id — auto-incremented on create.
  • schedule — standard cron expression, validated on create and load.
  • skill — fired as /{skill} content into the target conversation.
  • agent — agent running the conversation.
  • sender — sender attribution (default "cron").
  • quiet_start / quiet_end — optional HH:MM window in local time. If the fire time falls inside, the tick is skipped silently. No queuing, no catch-up. Both must be set; otherwise quiet hours are ignored.
  • once — fire once then delete.

Architecture

Cron is a separate binary (crabtalk-cron). It runs as a system service managed by crabup / launchd / systemd. The daemon has no cron code — no field, no handlers, no protocol messages.

The binary uses the standard #[command::command] macro — start, stop, run, logs, same as every other service. No admin subcommands; schedule edits are direct file edits.

Persistence

The scheduler reads $CRABTALK_HOME/config/crons.toml. To add, remove, or change a schedule, edit this file in place. The running service polls the mtime every 2 seconds and reconciles timers on change — abort removed schedules, start new ones. Atomic write (tmp + rename) keeps readers consistent.

For once schedules the service deletes the entry after firing — the only mutation the service itself makes.

Firing

On a scheduled tick the service calls ConnectionInfo::stream from the SDK with:

#![allow(unused)]
fn main() {
StreamMsg {
    agent: "<from entry>",
    content: "/<skill>",
    sender: Some("<from entry>"),
    ..Default::default()
}
}

The reply stream is drained and discarded — output goes to conversation history through the daemon’s normal path. Failures surface as Err items on the receiver; the schedule continues on the next tick.

Alternatives

Keep cron inside the daemon. Rejected. Cron forced a P: Provider bound on the daemon struct for no reason other than that CronStore called runtime.send_to. Embedding also prevented alternate schedulers (e.g. multi-tenant cloud schedulers) from reusing the daemon without running cron.

Introduce an InvokeSkill protocol variant. Rejected. Cron already has everything it needs from SendMsg / StreamMsg — the content /{skill} pattern is what chat-driven slash commands use too. Adding a variant would fragment the trigger contract across consumers and force every downstream scheduler (e.g. external ones) to learn a new wire format.

Cron as a peer protocol endpoint with its own socket for admin. Rejected. The admin surface is thin and routing it through a dedicated socket multiplies client complexity (TUI would need to discover and connect to cron too). Instead, admin is direct file editing — the running service picks up changes via mtime polling.

Put CronEntry + validators in wcore for downstream consumers to reuse. Rejected. Multi-tenant cloud schedulers need different entry fields (tenant id, different trigger payload), different storage (database rows, not TOML), and different time-zone semantics (UTC, not local). The cron::Schedule::from_str “validator” is one line; reimplementing is_quiet for UTC is trivial. Sharing a struct would force alignment on details that shouldn’t align.

Introduce a Trigger trait / crates/trigger library. Premature. Cron is the only trigger today (chat gateways are structurally similar but each has its own domain-specific code — Telegram auth, WeChat sync, etc.). A common trait only becomes clear once a second non-chat trigger lands (webhook, file-watch).

Unresolved Questions

  • Should cron support a time-zone override per entry (instead of local time for everything)?
  • Should there be a max-concurrent-fires limit so a quiet window ending doesn’t burst?

0082 - Scoping

  • Feature Name: Agent Scoping
  • Start Date: 2026-03-22
  • Discussion: #82
  • Crates: runtime, core

Updated by 0193 (Agent-Owned MCP) (2026-04-28). AgentScope.mcps was removed: agents now embed their MCP server configurations by value, so MCP scoping is intrinsic to the agent’s declaration and no separate allowlist is needed.

Summary

A whitelist-based scoping system that restricts what an agent can access: tools and skills. Enforced at dispatch time and advertised in the system prompt. This is a security boundary, not a hint. MCP scoping is no longer part of AgentScope — see 0193 for the replacement model.

Delegation is not scoped: crabtalk is a single-user runtime, and any registered agent can delegate to any other. Multi-tenant identity-based access control, if ever needed, belongs in a wrapper above the runtime, not inside AgentConfig.

Motivation

In multi-agent setups, a delegated sub-agent should not have the same capabilities as the primary agent. A research agent doesn’t need bash. Without scoping, every agent has access to everything — which means a misbehaving or confused agent can call tools it was never intended to use.

Scoping solves this by letting agent configs declare exactly what resources are available. The runtime enforces it.

Design

AgentScope

#![allow(unused)]
fn main() {
pub struct AgentScope {
    pub tools: Vec<String>,     // empty = unrestricted
    pub skills: Vec<String>,    // empty = all skills
}
}

Empty list means unrestricted. Non-empty means only listed items are allowed. This is an inclusive whitelist, not a denylist. MCPs are not part of AgentScope: AgentConfig.mcps: Vec<McpServerConfig> makes the declaration itself the scope (RFC 0193).

Whitelist computation

When an agent has any scoping (non-empty skills), the runtime computes a tool whitelist during on_build_agent:

  1. Start with BASE_TOOLS: bash, ask_user, read, edit — always available.
  2. If memory is enabled: add recall, remember, memory, forget.
  3. If skills list is non-empty: add skill tool.
  4. MCP tools the agent declared in AgentConfig.mcps are always included — declaration is the gate.

The computed whitelist replaces config.tools. Tools not on the list are invisible to the agent. The delegate tool is always available — delegation is not gated by scope.

Prompt injection

A <scope> block is appended to the system prompt listing the agent’s allowed resources:

<scope>
skills: check-feeds, summarize
</scope>

This tells the agent what it can use. The agent doesn’t need to guess or discover — its boundaries are stated upfront. MCP servers are listed separately in the resource-hints block from the agent’s own mcps declaration.

Enforcement

Scoping is enforced at two dispatch points:

  • Tool dispatch — rejects tool calls not in the agent’s tool whitelist.
  • Skill dispatch — rejects skill names not in the agent’s skill list.

MCP dispatch needs no explicit gate: the agent only sees the MCPs it declared, so calls outside that set are structurally impossible.

Enforcement happens at runtime, not just at prompt time. Even if the LLM ignores the <scope> block and tries to call a restricted tool, the dispatch layer rejects it.

Sender restrictions

Not all base tools are available to all senders. bash is blocked for non-CLI senders (gateway agents from Telegram, WeChat, etc.) because it grants arbitrary shell access. read and edit have no sender restriction — they are read-only or scoped mutations that are safe for gateway agents. See #67.

Delegate CWD isolation

When delegating parallel tasks, the orchestrating agent can assign each sub-agent a separate working directory via the cwd field on DelegateTask. Tools resolve relative paths against the conversation CWD, so isolated CWDs prevent concurrent sub-agents from trampling each other’s files. The edit tool’s unique-match requirement provides a second layer: if another agent changed the file between read and edit, old_string won’t match and the edit fails — optimistic concurrency without locks.

Default agent

The default agent (primary) has no scope restrictions — empty lists on all three dimensions. Scoping is for sub-agents that need constrained access.

Alternatives

Denylist instead of whitelist. List what’s forbidden instead of what’s allowed. Rejected because allowlists are safer by default — a new tool or server is inaccessible until explicitly granted. Denylists require updating every time a new resource is added.

Prompt-only scoping. Tell the agent its restrictions in the prompt but don’t enforce at dispatch. Rejected because LLMs don’t reliably follow instructions — a determined or confused model will call tools it was told not to. Enforcement must be at the dispatch layer.

Unresolved Questions

  • Should scoping support wildcard patterns (e.g. mcp: search-*)?
  • Should scope violations be logged as security events for monitoring?

0121 - Event Bus

  • Feature Name: Unified Event Bus
  • Start Date: 2026-04-04
  • Discussion: #121
  • Crates: daemon, core, runtime
  • Updates: 0080 (Cron)

Summary

A daemon-level event bus that routes named events to target agents via exact-match subscriptions. Agent completion is the first built-in event source. The bus also enables non-blocking delegation and ad-hoc worker agents.

Motivation

The daemon can trigger agents on a schedule (cron) and run agents on user request (protocol). But there’s no way for one agent’s completion to trigger another agent. The Signal pipeline (crabtalk/app#59) needs exactly this:

RSS fetch → Scout classifies → Crab enriches → client notification

Each stage produces a result that the next stage consumes. Without an event system, this requires the client to orchestrate the chain — polling, waiting, re-sending. The daemon should own this.

Separately, delegate blocks the parent agent until all tasks complete. For background research or parallel work, this is a limitation. If the daemon can route agent completion events, non-blocking delegation falls out for free.

Design

Event bus

An in-memory subscription table that matches events by exact source string and fires target agents with the event payload as message content.

# events.toml
[[subscription]]
id = 1
source = "agent:scout:done"
target_agent = "crab"
once = false

Follows the CronStore pattern: HashMap-backed, TOML-persisted, auto-incrementing IDs, atomic writes (tmp + rename). Survives runtime reloads.

Event sources

Events are namespaced strings. Two source types exist today:

SourceExampleEmitter
Agent completionagent:scout:doneDaemon, via on_agent_event hook
Externalrss:fetch, signal:classifiedClient or adapter, via PublishEvent

Agent completion events are emitted automatically when a conversation stream ends. The payload is the agent’s final text response.

External events are published via the PublishEvent protocol message — any client, adapter, or webhook handler can fire events into the bus.

Routing

Event arrives (via DaemonEvent::PublishEvent)
  → event loop calls EventBus::publish() inline (no spawn)
  → exact match source against subscription table
  → for each match: fire target agent via SendMsg (fire-and-forget)
  → if once: remove subscription, persist

Events always start new work. There is no injection into active conversations — that’s a separate concern (#117).

Fired agents receive the payload as message content with sender "event:{source}". This follows the established convention ("delegate:{id}", "cron") for non-user senders.

Protocol

Four new operations on the Server trait:

message SubscribeEventMsg {
  string source = 1;
  string target_agent = 2;
  bool once = 3;
}

message UnsubscribeEventMsg { uint64 id = 1; }
message ListSubscriptionsMsg {}
message PublishEventMsg { string source = 1; string payload = 2; }

Responses: SubscriptionInfo for subscribe, Pong for unsubscribe/publish, SubscriptionList for list.

DaemonEvent::PublishEvent

All publish paths route through a single DaemonEvent::PublishEvent variant in the central event loop. This avoids lock-ordering issues — the event bus mutex is only acquired inside the sequential event loop, never from the protocol handler or hook callbacks directly.

#![allow(unused)]
fn main() {
DaemonEvent::PublishEvent { source, payload } => {
    self.events.lock().await.publish(&source, &payload);
}
}

Non-blocking delegation

The delegate tool gains a background: bool field. When true:

  1. Tasks are spawned via the existing spawn_agent_task mechanism
  2. dispatch_delegate returns immediately with task IDs
  3. The parent agent continues working
  4. When each task completes, the daemon emits agent:{name}:done
  5. Event bus routes the completion to any matching subscriptions

No new mechanism — just the existing spawn infrastructure plus the event bus.

Worker pseudo-agent

A built-in worker agent registered at daemon startup alongside crab. Always available as a delegate target without pre-configuration:

  • Inherits the system agent’s thinking setting
  • Gets the full tool registry (no explicit filter)
  • Ephemeral — sessions are killed after task completion (existing behavior)
  • Always a valid delegate target (delegation is not scoped)

This eliminates the friction of configuring named agents for ad-hoc tasks like “read these files and summarize” or “search for X in the codebase.”

What this is NOT

  • Not a message broker. No durability, no exactly-once delivery, no dead letter queues. Fire-and-forget with best-effort delivery.
  • Not an orchestration DAG. No conditional routing, no fan-out/fan-in. Agents subscribe to events — that’s it.
  • Not a replacement for delegate. Delegation is synchronous and returns results inline. Events are asynchronous and deliver results out-of-band. background: true bridges the two.

Updates

0080 - Cron

The cron system continues to work as-is. Cron entries fire skills via the daemon event channel — this is unchanged. A future iteration may refactor cron as an event source adapter, emitting cron:{id}:fired events into the bus, but this is not in scope. The event bus is additive, not a cron replacement.

Alternatives

Agent completion triggers (no bus). A simpler design where completion of agent X directly triggers agent Y, without a general subscription mechanism. Rejected because the Signal pipeline needs external events (RSS fetch results) alongside agent completions — a bus handles both uniformly.

Glob matching on source patterns. The RFC originally proposed wildcard subscriptions like "agent:*:done". Rejected for v1 — exact match covers all current use cases. Glob matching can be added when a real consumer needs it.

Template interpolation. The RFC originally proposed {{payload}} interpolation in a prompt_template field. Rejected — agents are the template engine. The payload goes in as-is; the agent’s instructions handle interpretation.

Unresolved Questions

  • Should there be a max subscription count?
  • Should the bus detect infinite loops (agent A triggers B triggers A)? Currently fire-and-forget prevents stack overflow but allows unbounded chains of spawned tasks.

0135 - Agent-First Protocol

Summary

Replace session-centric protocol addressing with agent-centric addressing. Users talk to agents, not sessions. Introduce guest turns for multi-agent conversations and compaction archives as the agent’s long-term memory.

Motivation

The original protocol was session-centric: clients managed session IDs to kill, reply, compact, and route messages. This leaked an implementation detail (the session ID) into every client and forced multi-agent interaction into either permanent agent switching or invisible delegation.

Problems with the session model:

  1. Session IDs leak everywhere. Every client (CLI, Telegram, WeChat, IDE) must track session IDs to route replies, kill conversations, and handle ask_user prompts. If a client loses the ID, the conversation is orphaned.

  2. Multi-agent is invisible. When agent A delegates to agent B, the result comes back as a tool result string. The user hears A’s summary of B’s answer, never B’s actual voice. There’s no multi-agent conversation.

  3. Session ≠ conversation. “Session” conflated device connections (CWD, transport state) with agent memory (message history, compaction). These are different lifecycles — connections are ephemeral, conversations persist.

Design

Core model

Each agent has one continuous conversation per user. Conversations are keyed by (agent, sender) — no session IDs in the protocol.

Client: StreamMsg { agent: "crab", content: "hello", sender: "user" }
Daemon: resolves (crab, user) → internal conversation, runs agent, streams response

Conversation vs session

SessionConversation
WhatDevice ↔ daemon connectionAgent’s memory with a user
Keyconnection/device ID(agent, sender)
Lifetimeephemeralpersistent
StateCWD, transportmessages, title, JSONL, archives

Sessions are daemon-internal. Conversations are the protocol-visible abstraction.

Protocol changes

Client messages address conversations by (agent, sender):

message StreamMsg {
  string agent = 1;
  string content = 2;
  optional string sender = 4;
  optional string cwd = 5;
  optional string guest = 6;  // guest turn
}

message KillMsg {
  string agent = 1;
  string sender = 2;
}

message ReplyToAsk {
  string agent = 1;
  string sender = 2;
  string content = 3;
}

message CompactMsg {
  string agent = 1;
  string sender = 2;
}

Removed from the protocol: session (u64 ID), new_chat, resume_file.

Server responses no longer include session IDs:

message StreamStart {
  string agent = 1;  // no session field
}

Guest turns

The guest field on StreamMsg enables multi-agent conversations. When set, the daemon runs the guest agent against the primary agent’s conversation history — text-only, no tool dispatch.

Flow:

  1. Client sends StreamMsg { agent: "twin", content: "question", guest: "crab" }
  2. Daemon finds twin’s conversation
  3. Adds user message to twin’s history
  4. Injects guest framing (auto-injected system message)
  5. Runs crab against twin’s history with crab’s system prompt (no tools)
  6. Tags response with agent: "crab"
  7. Appends to twin’s history

The guest’s response appears as a first-class message in the conversation, attributed to the guest. No delegation, no tool results, no paraphrasing.

Bidirectional framing

Both guest and primary need context about multi-agent conversation:

  • Guest framing (injected when a guest runs): “You are joining a conversation as a guest. Messages wrapped in <from agent="..."> tags are from other agents.”
  • Primary framing (injected when the primary runs and guest messages exist in history): “Messages wrapped in <from agent="..."> tags are from guest agents. Continue responding as yourself.”

Both are auto_injected — stripped before each run, re-injected fresh. Zero accumulation.

Message attribution

The Message struct gains an agent field:

#![allow(unused)]
fn main() {
#[serde(default, skip_serializing_if = "String::is_empty")]
pub agent: String,
}

Empty = the conversation’s primary agent. Non-empty = a guest. When building LLM requests, assistant messages with non-empty agent are prefixed with <from agent="..."> XML tags so every agent can distinguish speakers.

Message::with_agent_tag() handles the prefixing — one function, used by both build_request and guest_stream_to.

Compaction as memory

Compaction markers become archive boundaries. Each compact marker stores a title (first sentence of the summary, max 60 chars) and a timestamp:

{"compact":"Summary of pricing discussion...","title":"Pricing analysis for solo dev tools.","archived_at":"2026-04-03T10:00:00Z"}

The conversation is continuous — compaction doesn’t create a new conversation, it archives a segment of the existing one. Archived segments are browsable via Conversation::load_archives() and available to the recall tool as long-term memory.

Crab's memory:
├── [active] Current conversation
├── "Pricing analysis for solo dev tools." — 2 days ago
├── "Auth module refactor plan." — 5 days ago
└── "HN competitor signal analysis." — last week

What dies

  • Session IDs in the protocol — replaced by (agent, sender)
  • new_chat — the conversation is continuous, compaction handles the window
  • resume_file — one conversation per (agent, user), always active
  • Client-side @mention logic (0078) — guest turns handle it daemon-side
  • Session forking — agents are the abstraction, not sessions

Supersedes

0064 - Session

The session model is replaced by conversations. The JSONL file format is preserved (backward compatible with added title and archived_at fields on compact markers, and agent field on messages). The Session struct is renamed to Conversation. Session IDs are removed from the protocol.

0078 - Compact Session

The compact-then-handoff pattern for @mentions is replaced by guest turns. The daemon handles multi-agent conversation natively — no client-side compact logic needed.

Updates

0018 - Protocol

Session-addressed messages are replaced with (agent, sender) addressing. StreamMsg and SendMsg gain a guest field. SessionInfo becomes ActiveConversationInfo. See protocol changes section above.

0038 - Memory

Compaction archives become the primary long-term memory mechanism. The recall tool searches across archived segments. See #101 (revised) for the pluggable memory provider aligned with this model.

0150 - Memory Store

  • Feature Name: Memory Store
  • Start Date: 2026-04-14
  • Discussion: #38
  • Crates: memory, crabtalk, runtime
  • Supersedes: 0038 (Memory)

Updated by 0189 (2026-04-28). Auto-recall (Memory::before_run) was removed; recall is now strictly model-driven. See 0189 for the rationale.

Summary

A standalone crabtalk-memory crate backing agent memory with a single binary db file, atomic persistence, and BM25 recall. The markdown tree is a human-facing export — not the primary store. Entries come in two kinds: Note (agent-written via remember/forget) and Archive (compaction output). The agent’s system prompt is human-managed via Crab.md (existing layered-instructions mechanism) — the memory store has no opinion on it.

Motivation

RFC 0038 bet on file-per-entry markdown as the primary store. In practice that premise did not hold:

  • Atomic writes don’t compose across many files. Every remember/forget touched an entry file plus a sidecar index; a crash mid-op left the tree inconsistent. A single-file db is atomic by rename+fsync.
  • Compaction archives need a store. Agent-First (0135) made compaction archives first-class long-term memory. Archives share recall and lifecycle with notes, but aren’t user-editable text — they’re generated output. A kind-typed entry in the db is the right home.
  • Aliases improve recall. Humans reach for an entry under several names (“release” / “ship” / “deploy”). BM25 needs them as indexable terms, which frontmatter had no slot for.
  • Dump/load still matters for humans. Users want to read and edit memory with a text editor or mdbook. That’s solved by exporting the db as a markdown tree on demand, not by making the tree the source of truth.

A separate observation that shaped the API surface:

  • The system prompt is not memory. 0038 carried a MEMORY.md curated overview that the agent could rewrite via a dedicated memory tool. That conflated two different things: persistent recall (the agent’s notes) and instructions (the human’s prompt). It also gave the agent a footgun — overwriting the whole thing in a single tool call with no diff. Killed: the memory tool, the Prompt entry kind, and the reserved global name. The system prompt now lives in Crab.md (already a file, already layered, already human-edited). If a human wants the agent to edit it, they grant that in prose inside Crab.md and the agent uses the standard file-edit tools.

Design

Crate layout

crabtalk-memory is a standalone crate. The crabtalk hooks own one Memory handle and expose a SharedStore = Arc<RwLock<Memory>> to the runtime so compaction can write archives and session resume can read them.

Binary file format (CRMEM v1)

All integers are little-endian. Strings are UTF-8, length-prefixed by a u32 byte count (no NUL terminator). The whole file is one contiguous blob — no sections, no index, no padding.

Header — 16 bytes:

offset  size  field      value
------  ----  ---------  -------------------------------------------------
 0       6    magic      "CRMEM\0"
 6       4    version    u32  (= 1)
10       2    flags      u16  (= 0; unknown bits rejected on read)
12       4    reserved   [u8; 4] (= 0)

Body:

size  field        notes
----  -----------  -----------------------------------------------------
 8    next_id      u64   monotonic EntryId allocator; persisted so
                         IDs stay stable across open/close
 4    entry_count  u32
 *    entries      entry_count repetitions of the per-entry record

Per entry:

size  field        notes
----  -----------  -----------------------------------------------------
 8    id           u64
 8    created_at   u64   unix seconds
 4    kind         u32   0 = Note, 1 = Archive
 4    name_len     u32
 *    name         utf8 bytes, name_len long
 4    content_len  u32
 *    content      utf8 bytes, content_len long
 4    alias_count  u32
 *    aliases      alias_count repetitions of { u32 len + utf8 bytes }

kind is u32 rather than u8 so the fixed entry prefix stays 4-byte aligned — cheap hygiene for any future on-disk index work. The inverted BM25 index is not persisted; it’s rebuilt from entries on load. Keeps the file small and the format boring.

Reader invariants: magic mismatch, wrong version, non-zero flags, truncated body, invalid UTF-8, or an unknown kind value all fail the open with BadFormat. A missing file opens an empty db (the file is created on the first successful write).

Persistence

Every apply(Op) mutates RAM then flushes atomically. The flush sequence is:

  1. Encode the entire db to an in-memory Vec<u8>.
  2. create_dir_all(parent) if needed.
  3. Write to a sibling temp file {name}.tmp and fsync it.
  4. rename(tmp, path) — atomic on POSIX when on the same filesystem.
  5. fsync the parent directory so the rename itself is durable.

A flush failure leaves RAM ahead of disk until the next successful op or the next open (which re-reads the file). WAL closes that window in v2. Memory::checkpoint() forces the same flush without a mutation.

Entry model

#![allow(unused)]
fn main() {
enum EntryKind { Note, Archive }

struct Entry {
    id: u64,
    name: String,
    content: String,
    aliases: Vec<String>,
    created_at: u64,
    kind: EntryKind,
}
}
  • Note — remember/forget entries.
  • Archive — compaction output. Written by the runtime during compaction, surfaced by recall as long-term memory (per 0135).

Kind is immutable per entry: Update rewrites content and aliases but keeps kind; use Remove + Add to change it.

Write ops

Writes go through an Op enum:

#![allow(unused)]
fn main() {
enum Op {
    Add    { name, content, aliases, kind },
    Update { name, content, aliases },
    Alias  { name, aliases },
    Remove { name },
}
}

Memory::apply(op) mutates + flushes. Callers never touch fs::write directly.

Recall

BM25 with Lucene-style IDF (ln((n - df + 0.5)/(df + 0.5) + 1.0)), k1=1.2, b=0.75. The index is an inverted index of tokens from entry content and aliases, keyed by EntryId. Search walks the posting lists for query terms instead of rescanning every entry on every query.

Recall is model-driven

There is no auto-recall. RFC 0189 removed the per-turn injection: the runtime never silently searches memory or prepends <recall> blocks. Recall happens only when the model calls the recall tool itself, or when a client explicitly searches memory before sending a user message. The Memory::before_run helper is gone; MemoryHook no longer participates in on_before_run.

System prompt

The hook contributes one <system_prompt> fragment: the contents of prompts/memory.md, which tells the agent when to use the memory tools (tool signatures come from each input struct’s /// doc comment via schemars). The agent’s identity / behavior prompt is not the memory store’s responsibility — it’s Crab.md, layered from <config_dir>/Crab.md and any project-local Crab.md walked up from CWD (see daemon::host::discover_instructions).

Tools

Three tools exposed to the agent:

  • remember(name, content, aliases) — upsert a Note.
  • forget(name) — delete a Note.
  • recall(query, limit) — BM25 search, returns formatted results.

There is no memory tool. Editing the agent’s system prompt is a human action against Crab.md. If the human wants to delegate that authority, they say so in Crab.md and the agent uses the standard file-edit tools — no special-case tool, no reserved entry name, no parallel write path.

Dump / load

Memory::dump(dir) writes the db as an mdbook-ready tree for humans:

brain/
  book.toml               ← seeded on first dump; user edits survive re-dumps
  SUMMARY.md              ← mdbook ToC (ignored on load)
  notes/{name}.md
  archives/{name}.md

The seeded book.toml sets src = "." so mdbook serve brain/ works against the tree as-is — no shuffling into an src/ subdirectory. It’s only written when absent; any customizations survive later dumps.

Each entry file starts with an HTML metadata block, followed by pure markdown content:

<div id="meta">
<dl>
  <dt>Created</dt>
  <dd><time datetime="2026-04-14T10:23:45Z">2026-04-14T10:23:45Z</time></dd>
  <dt>Aliases</dt>
  <dd><ul><li>ship</li><li>release</li></ul></dd>
</dl>
</div>

prod rollout steps ...

Chosen for mdbook: <dl> / <dt> / <dd> is the semantic HTML for key-value metadata, renders as a labeled info card, and doesn’t pollute mdbook’s heading tree. <time datetime="..."> round-trips the exact unix timestamp. A file that doesn’t start with <div id="meta"> is treated as pure content with no metadata.

Memory::load(dir) reads the tree and replaces the db. It validates fully before mutating — a mid-load error leaves the current state untouched. Each kind’s subdirectory is cleared on dump so renames and deletes don’t leave orphan files behind; anything else in dir (e.g. a customized book.toml, a theme/ directory) is left alone.

Alternatives

Stay with file-per-entry (0038). Rejected — compaction archives need a real store, and atomic multi-file writes would require WAL anyway. A single file gets atomicity for free.

SQLite. Overkill for 10²–10³ entries, adds a dependency and schema migrations. A 200-line hand-rolled format is simpler and easier to inspect with xxd.

Embedding-based search. Still rejected for the same reasons as 0038: requires a vector store and embedding model. BM25 is fast, dependency-free, and works well at the entry sizes agents produce.

Unresolved Questions

  • WAL for crash safety in the window between the RAM mutation and the atomic flush.
  • Should load() merge instead of replace?
  • Should archives expire or be garbage-collected past some age / count?

0184 - crabup

  • Feature Name: crabup
  • Start Date: 2026-04-24
  • Discussion: #184
  • Crates: new crabup binary; consumes command; shrinks crabtalkd
  • Updates: 0043 (Component System)

Summary

crabup is a thin wrapper over cargo install that also owns launchd/systemd/schtasks lifecycle for every crabtalk binary. crabup install crabtalkd spawns cargo install crabtalkd. The value add is service management — the one thing cargo install doesn’t do — not distribution, not version coordination, not a registry.

Motivation

Two real problems today, both about service management, not about distribution:

  1. The daemon is its own installer. crabtalkd start generates and loads a platform unit for itself via the command crate; every other binary (crabtalk-telegram, crabtalk-wechat, …) does the same thing with the same code. A daemon shouldn’t install itself, and the install path shouldn’t live in three places.
  2. No one-stop service surface. ps, logs, start, stop are duplicated per binary and absent for most. Users need a single tool that knows about all crabtalk services on the machine, not one subcommand per binary.

Distribution is already handled: every crabtalk crate publishes to crates.io with version.workspace = true, so cargo install crabtalkd is the install story today. It will remain the install story under crabup — crabup just renames the command and wraps service management around it.

RFC 0043 defined how components talk to the daemon (port-file discovery, MCP contract). This RFC defines how they get installed and stay alive.

Design

Command surface

crabup pull <name> [--version X]      # cargo install crabtalk-<name> (or crabtalkd)
crabup rm <name>                      # cargo uninstall
crabup update                         # bump every installed crabtalk-* crate to latest
crabup list                           # installed crabtalk-* crates
crabup ps                             # all crabtalk services, one view

crabup <name> start                   # install + load platform unit
crabup <name> stop
crabup <name> restart
crabup <name> logs [-f]

<name> is a short name from the resolution table below, so crabup daemon start, crabup telegram start, crabup search logs -f. Each short name is both a pull/rm target and a service-command namespace. pull/rm install and remove crabtalk binaries via cargo install; pkg add/pkg remove install and remove crabtalk packages (manifests + cached source repos), so the user has one tool for both install surfaces.

crabup update is always batch — it bumps every installed crabtalk-* crate to the latest version on crates.io, same shape as rustup update over its components. There is no per-component update verb: if you only want to change one crate, that’s crabup pull <name> --version <X>. This makes “keep the set aligned” the default behavior of the only tool users will reach for when they want newer bits, without needing atomic-set machinery to enforce it.

That’s it. No pin, no doctor, no component add vs pull split — cargo install already handles versions; a component is just a crate you can run as a service. No atomic-set enforcement; if a user mixes versions and breaks the wire, the fix is crabup pull <name> --version <matching> for the mismatched one or crabup update to bump everything.

pull is a pass-through

crabup pull <name>
  ↓ resolve name → crate ("tui" → "crabtalk-tui"; "daemon" → "crabtalkd")
  ↓ cargo install <crate> [--version X]

Name resolution is a small table compiled into crabup:

Short namecrates.io crateRole
daemoncrabtalkddaemon
tuicrabtalk-tuiREPL client
telegramcrabtalk-telegramTelegram gateway
wechatcrabtalk-wechatWeChat gateway
searchcrabtalk-searchmeta-search service
croncrabtalk-cronscheduler

crabup pull <short> resolves via the table; crabup pull <anything-else> passes through verbatim so crabup pull some-third-party-crabtalk-gateway still works without a table edit. New first-party binaries get a row added when they ship.

Binaries land in ~/.cargo/bin, where cargo install has always put them. crabup list reads ~/.cargo/.crates.toml and filters for crabtalk*. There is no parallel state file; if .crates.toml is wrong, cargo is wrong, and crabup being wrong with it is the correct behavior.

Prerequisite: cargo on PATH. If missing, crabup prints one line pointing at https://rustup.rs and exits. No auto-install, no curl-pipe — the daemon doing that was part of what motivated this RFC.

Service management (the real content)

The command crate already renders launchd.plist, systemd.service, and schtasks.xml and exposes install/uninstall/log-tail helpers. It stays. What changes is the caller: today each binary calls command::install from its own CLI; after this RFC only crabup calls into command. crabtalkd start/stop/ps/logs are deleted; so are the mirrored flags in crabtalk-tui.

crabup <name> start is:

  1. Find the binary on PATH (fail fast if not installed).
  2. Look up service metadata in crabup’s name table — the same table that resolves short names to crates also carries label (mechanical: ai.crabtalk.<name>) and description. crabup is the package manager; it owns this metadata, the binaries don’t need to expose it.
  3. Render the platform unit via command and load it.

crabup ps is the one piece that needs more than wrapping: it scans ~/.crabtalk/run/*.port (the same directory RFC 0043 already defines) and checks each listener, then cross-references with whatever the platform’s service manager reports for ai.crabtalk.* labels. One view, all services.

Component model

RFC 0043 stands unchanged. A component is a binary that writes a port file on startup and serves MCP on that port. crabup doesn’t alter the contract — it just installs and service-manages those binaries the same way it does crabtalkd. “Install a component” and “install the daemon” are the same operation under different names.

crabllm as a managed service (optional, motivated)

Today crabllm-provider is a library linked into crabtalkd. Making crabllm a separate service is worth doing only if at least one of these is concrete:

  • One set of provider credentials serves multiple daemons on the same machine.
  • Central place for provider fallback, rate-limit smoothing, or caching.
  • Swap models or provider SDKs without restarting crabtalkd.

None of those are pressing yet. When one is, crabtalk-llmd becomes another crate crabup installs and service-manages, same as any gateway. The RFC doesn’t need to anticipate it.

Impact on crabtalkd

Removed from crabtalkdReplaced by
Command::Start { force }crabup daemon start (first install: crabup pull daemon)
Command::Stop, Restartcrabup daemon stop / crabup daemon restart
Command::Pscrabup ps (all services, one place)
Command::Logscrabup daemon logs
ensure_config + attach::setup_llm on first startcrabup daemon start first-run flow
Duplicate forwarding in TUI (--start, --stop)Removed

After this, crabtalkd’s CLI is run (the long-running process the service unit invokes, equivalent to today’s --foreground), reload, and events. Package install/uninstall live in crabup as pkg add/pkg remove, not in the daemon CLI.

Alternatives

Plain cargo install, no crabup. Installs are one command, but users hand-write launchd/systemd units per binary, and ps/logs across services don’t exist. The service-management gap is the whole reason crabup is a separate tool.

A real package manager with its own manifest, signed pre-built binaries, version coordination, atomic-set installs. Previously drafted; cut. Infrastructure we don’t need — crates.io is the registry, workspace-version inheritance is the coordination, and the non-developer audience that would need pre-built binaries doesn’t exist yet. If that audience materializes, pre-built becomes a second crabup pull backend alongside the cargo install path.

Keep each binary’s start/stop/logs subcommand, just delete the cross-binary dispatcher. Leaves three copies of the same install code and no one-stop service view. Cuts nothing meaningful.

Dynamic plugin loading (shared objects). Rejected by RFC 0043 — shared fate with the daemon is the exact thing the component model avoids.

Unresolved Questions

  • Windows service layer. schtasks is weaker than launchd/systemd (no restart-on-failure, limited log routing). Acceptable for v1, or not?
  • rm scope. Should crabup rm daemon also remove ~/.crabtalk/config/? Leaning no (rm is binary-only; data stays); confirm.
  • Multiple daemon instances. If two crabtalkd instances run on one machine, what owns ~/.crabtalk/? Out of scope for v1.

0185 - Session Search and Storage Primitives

Updated by 0189 (2026-04-28). The “automatic compaction on overflow as a safety net” carve-out and auto-title generation were both removed; clients drive both via compact_conversation and a future generate_title RPC, gated on the new AgentEvent::ContextUsage events. See 0189 for the rationale.

Summary

Collapse the topic subsystem. Sessions persist unconditionally and carry a small runtime-managed meta blob. Recall gains a second BM25 index — this one over conversation messages — returning windowed excerpts with bounded size. The runtime exposes narrow session primitives and two search tools; client UX owns /clear, /new, /compact, titling, and session routing. The “topic” concept dissolves: content-derived session search (BM25) replaces tag-based grouping, and any curated grouping that survives is a client concern.

Motivation

RFC 0171 introduced topic switching to partition a single (agent, sender) pair into N parallel threads keyed by title, with tmp chats that skip persistence until the agent “promotes” them by entering a topic. In practice it conflated four independent concerns into one knot — routing (“which conversation does this message land in?”), persistence policy (“should this session hit storage?”), recall indexing (“how do we find related past work?”), and lifecycle UX (“when does a chat end and a new one begin?”). Each wanted a different home, and riding one mechanism for all of them produced the TopicRouter reservation/rollback dance, the tmp/promote split, and agent-upfront title commitment on what should have been retrospective categorization.

The reframe driving this RFC: a topic is not a thing. It was a name trying to be a routing key, a memory kind, a session tag, and a recall index simultaneously. With BM25 over session messages, content-derived recall eats the tag’s lunch — the agent searches “cron refactor” and gets back the conversations that actually discussed it, without any of them ever being classified upfront. What remains worth keeping is a summary field that boosts search ranking when one happens to exist (piggybacking on work the runtime already does during overflow compaction).

Design

Layering: runtime vs. client

The runtime’s job is to provide mechanical primitives. UX and policy decisions — when to clear, when to compact, when to title, when to recall, which session to route a message to, how to surface archival browsing — belong one layer up in the client. RFC 0189 finished the move: the runtime no longer auto-compacts on overflow, no longer spawns title generation, no longer auto-recalls. Clients drive compact_conversation, future generate_title, and explicit memory search themselves, gated on AgentEvent::ContextUsage events.

Runtime primitives (policy-free):

  • new_session(agent, sender) -> id — always creates, always persists. No tmp, no deferred-persistence gate.
  • append_message(id, msg) — writes to storage and incrementally updates the session BM25 index.
  • list_sessions(filters?) -> [SessionSummary] — meta rows only, paginated.
  • list_messages(id, offset, limit) -> [Message] — paginated browse for when a caller wants to walk a session linearly.
  • get_session_meta(id) -> ConversationMeta — cheap lookup of current meta snapshot.

Search tools (agent-facing):

  • search_memory(query) -> [Entry] — unchanged. BM25 over memory entries; returns whole entries because entries are small.
  • search_sessions(query, context_before=4, context_after=4, filters?) -> [SessionHit] — new. BM25 over message text; returns bounded windowed excerpts.

Auto-behaviors: none. Both auto-titling and overflow compaction were removed by RFC 0189. The summary field on ConversationMeta is still populated when a client triggers compact_conversation, and session search still boosts on it; the runtime just doesn’t initiate either step on its own.

Client-owned (explicit non-goals for the runtime):

  • /clear, /new, /compact, “resume session by title”, session picker UX — composed from the primitives above.
  • Saved searches, archival browsing, “wiki view” — pure presentation.
  • Routing decisions — the client tells the runtime which session_id to append to; the runtime does not infer this from topic state.

ConversationMeta

The target shape, replacing the current struct in crates/core/src/storage.rs:

#![allow(unused)]
fn main() {
pub struct ConversationMeta {
    pub agent: String,            // immutable, set at creation
    pub created_by: String,       // immutable, set at creation
    pub created_at: String,       // immutable, set at creation
    pub title: String,            // empty until a client sets one (no auto-title; no wire RPC yet)
    pub updated_at: String,       // bumped on every append_message
    pub message_count: u64,       // bumped on every append_message
    pub summary: Option<String>,  // populated when a client calls compact_conversation
}
}

Removed: topic (subsumed by session search), uptime_secs (replaced by updated_at; uptime is derivable if a caller still needs it).

Writers:

FieldWriterWhen
agent, created_by, created_atruntimesession creation
titleempty by default; client-driven titling is a follow-up (no wire RPC yet)
updated_at, message_countruntimeevery append_message
summaryruntimewhen a client triggers compact_conversation

Meta is not an agent-writable blob. The runtime owns every field. If a later RFC needs an agent-curated field (e.g., session-to-entry back-links to optimize resume hydration), it lands as a separate proposal with a measured recall-failure case justifying the code cost — not speculatively in this one.

Schema migration

Zero-touch upgrade. All meta fields added by this RFC use #[serde(default)]; removed fields (topic, uptime_secs) are silently ignored on deserialize. On the next meta rewrite for a given session (any append_message triggers one), the removed fields are dropped from disk. No migration pass, no version bump, no operator intervention. Old session JSONL files mix cleanly with new writes.

Serde config on ConversationMeta:

  • #[serde(default)] on updated_at, message_count, summary.
  • #[serde(default, skip_serializing)] on the removed fields during the transition window if a Deserialize derive would otherwise reject unknown keys — standard #[serde(default)] struct-level behavior covers this without explicit skip.
  • No deny_unknown_fields anywhere on this struct.

Session search — BM25 over messages

The memory crate already ships a 157-line hand-rolled inverted BM25 index (crates/memory/src/bm25.rs, zero external deps). Session search reuses this primitive. Two choices, to be decided during implementation: (a) lift bm25::Index into a shared module used by both the memory crate and a new session index, or (b) instantiate a parallel index owned by the runtime. Either way, no new workspace deps.

Field weights, inherited from the community Claude Code conversation-search pattern (alexop.dev, raine/claude-history):

  • summary — 3.0× (when present; skipped when absent)
  • title — 2.0×
  • user messages — 1.5×
  • assistant messages — 1.0×
  • tool-use turns — 1.3× (proxy for “a solution was applied”)

Hit shape with explicit bounds. Messages can contain large tool results, blobs, or attachments. Returning raw Message objects in search windows would defeat the bounding the windowing was meant to provide. The hit type projects to a fixed small shape, not full messages:

#![allow(unused)]
fn main() {
pub struct SessionHit {
    pub session_id: u64,
    pub msg_idx: usize,
    pub score: f64,
    pub meta: SessionSummary,              // title, created_at, updated_at, message_count
    pub window: Vec<WindowItem>,           // context_before + match + context_after
}

pub struct WindowItem {
    pub role: Role,
    pub msg_idx: usize,
    pub snippet: String,                   // truncated to MAX_SNIPPET_BYTES
    pub truncated: bool,
    pub tool_name: Option<String>,         // for tool-use turns
}
}

Hard limits:

  • MAX_SNIPPET_BYTES = 1024 per window item.
  • MAX_WINDOW_ITEMS = context_before + 1 + context_after, capped at 16 regardless of caller request.
  • MAX_HITS_PER_QUERY = 20.

A full-message read always goes through list_messages(session_id, offset, limit) — there is no “load entire session” primitive, by design.

Performance budget and cold-start

Concrete targets this RFC commits to:

  • search_sessions query latency: p99 ≤ 50ms at 100k indexed messages; p99 ≤ 200ms at 1M. CPU-only — the index is in memory.
  • append_message indexing overhead: ≤ 1ms added per append at any index size up to 1M messages. Pure CPU.
  • Cold-start index rebuild: dominated by storage I/O, not BM25. The CPU portion is sub-second at 100k messages, but a real FsStorage rebuild does one load_session per persisted session — at 100k messages spread across 2k sessions, end-to-end rebuild is on the order of 10–20 seconds on local SSD. Rebuild runs in the background after daemon startup; live appends index immediately, so new work is always findable. Old sessions become searchable as the rebuild progresses. A future RFC can add on-disk index checkpointing if cold-rebuild latency becomes a felt operational concern.

These targets are verified by a criterion bench against FsStorage rooted in a tmpdir, not against the in-memory index alone. Failure of a CPU-side target blocks the phase; storage-bound rebuild time is monitored, not gated.

Session lifetime and deletion

This RFC treats sessions as immortal. There is no runtime delete_session primitive; storage grows unboundedly with agent activity. This is an explicit scope decision: garbage collection is a separate operational concern (retention policy, archival, export-and-prune) that warrants its own RFC once usage patterns reveal what the right policy is. In the meantime, operators who need to prune can do so at the filesystem layer — JSONL files in sessions/ are safe to delete offline; the index rebuilds from disk on next start.

When delete support lands, it needs to: (a) remove JSONL file, (b) remove postings from the BM25 index, (c) invalidate any in-memory SessionSummary cache. None of that is in scope here.

Auto-compaction as safety net

Overflow compaction stays, because context-window overflow is a hard constraint the client layer can’t enforce. Two changes versus today: (a) compaction additionally populates ConversationMeta.summary so session search can boost it, and (b) compaction is no longer per-topic (there are no topics) — it fires per session, which is what a client would expect anyway.

The existing AgentConfig::compact_threshold continues to fire on token-budget pressure, not overflow-only; “overflow safety net” here is shorthand for “context-pressure-driven, not user-driven.” Discretionary compaction (“I want to clean up this old chat”) is a client concern — the runtime optionally exposes a compact(session_id) helper in a follow-up RFC if clients converge on needing one. Not required to ship this one.

Alternatives

Semantic retrieval via embeddings. Deferred. Lexical BM25 covers the 80% case at zero new deps and microsecond query time. A vector index adds an embedding model or API dependency, hundreds of MB of index storage, and a hybrid-search ranking story. Revisit when lexical recall demonstrably misses on a labeled test set — not before.

Keep topic as a tag. Rejected. With BM25 over messages, tag-based filtering is redundant with query-based retrieval at the cost of requiring disciplined agent tagging and introducing tag-name drift (“cron refactor” vs “cron cleanup”). The tag was the join key between memory and sessions; BM25 is the join key now.

Single unified recall() tool that queries memory and sessions together. Rejected. Two explicit tools are cheaper for the agent to reason about — it knows what it is paying for in each call, and the two stores have different payload-sizing rules (memory entries are small and returned whole; session hits are bounded excerpts). Composition in prompt-space is the right layer.

Agent-curated session-to-entry back-links (linked_entries). Considered and removed from this RFC. The primitive has a reference-rot problem (entry names change or are deleted; the link silently dangles) and its concrete benefit is a recall optimization whose cost — two new tools, a persisted Vec<String>, and a new agent behavior — isn’t justified until BM25 demonstrably misses a case it would have caught. If such a case shows up in practice, a follow-up RFC can propose it with reference-by-id semantics and a measured justification.

Keep read_session(id) as full-history load. Rejected. Unbounded reads are a context-window hazard and the functionality is better served by list_messages (paginated browse) plus windowed excerpts from search.

Migration

Phased implementation, one commit per phase per CLAUDE.md’s workflow rule. Order is deliberate: delete first, build on a clean foundation, then layer the search feature. This avoids the awkward intermediate state where the topic subsystem and the new primitives coexist.

Phase 1 — Delete the topic subsystem. Remove switch_topic, search_topics, TopicRouter, the tmp/promote gating, the entire crates/crabtalk/src/hooks/topic/ module, Runtime::switch_topic and its helpers, and ConversationMeta.topic (storage-side). Sessions now always persist. EntryKind::Topic is kept for now as a presentation label (see open questions). Commit should be heavily negative line-count — mostly subtraction.

Rollback: git revert. Every phase is one commit; revert is the rollback plan.

Phase 2 — ConversationMeta cleanup. Drop uptime_secs. Add updated_at and message_count, wired into append_message. Verify zero-touch read of existing session files via serde(default). Add nextest coverage for mixed-version reads.

Phase 3 — Session BM25 index + search_sessions tool. New index in the runtime (decide lift-vs-parallel with memory crate’s bm25::Index inside this phase). Incremental updates on append_message. New tool wired through the hook registry. Add a criterion bench verifying the performance budget (§ Performance budget and cold-start). If cold-start rebuild exceeds 500ms at 100k messages, this phase also adds on-disk checkpointing before merge.

Phase 4 — summary field + overflow compaction wiring. Populate ConversationMeta.summary during compaction. Thread it into search_sessions as the 3× boost field. Nextest coverage: session with a summary ranks above an otherwise-equivalent session without one for the same query.

Phase 5 — Documentation. Update CLAUDE.md / CONTRIBUTING.md on the runtime-vs-client boundary. Update hook examples that referenced topics. Move 0171 into superseded.md.

Open questions

  • EntryKind::Topic fate. Keep as a purely presentational label for long-form aggregated entries, or delete entirely and treat “wiki” entries as ordinary project entries? The label earns its keep only if a UI or search-ranking consumer branches on it. Current lean: delete in a follow-up once Phase 1–5 are stable and we can confirm no consumer actually reads the tag.
  • On-disk index checkpointing. Governed by the Phase 3 bench. If cold-start stays within budget, defer; if not, land it inline. Decision deferred to measurement, not debate.
  • Session BM25 field-weight calibration. Adopt community defaults as-is. A labeled test set of ≥50 queries with known-relevant sessions triggers a re-tuning pass if agent recall on that set falls below 80% top-3 hit rate. Until that set exists, the weights are frozen.
  • Discretionary compact(session_id) helper. Ship only when a client demands it. Not in this RFC.

0189 - Policy at the Edge

Summary

Mechanism belongs in the daemon; policy belongs at the edge. The daemon stops making decisions on the user’s behalf — it no longer auto-compacts on a token-count heuristic, no longer spawns title-generation calls in the background, no longer BM25-searches memory and injects synthetic <recall> user turns. Each of these is now an explicit RPC the client calls when (and if) it wants the behavior. A new AgentEvent::ContextUsage { usage } carries real per-step token counts so clients can pick their own pressure threshold. The Hook::on_before_run lifecycle method is removed.

Motivation

Three independent features had drifted toward the same anti-pattern: the daemon making policy decisions using its own heuristics, then mutating conversation state on the user’s behalf without being asked. RFC 0000 codified auto-compaction at a chars/4-derived threshold. RFC 0038 (then 0150) codified auto-recall as a per-turn before-run injection. The runtime grew a quiet spawn_title_generation call inside finalize_run. Each was useful in isolation. Together they shaped a daemon that thought it knew best.

The cost of that posture:

  • Bad heuristics. Token estimation as chars/4 is wrong for code, JSON tool outputs, and non-English prose. The threshold either trips early (destroying live context with an unwanted summary) or trips late (the request fails anyway). The daemon doesn’t have the inputs — model identity, real token counts, user intent — to pick a threshold. Clients do.
  • Synthetic events. Auto-compaction yielded AgentEvent::Compact followed by hand-forged TextStart/TextDelta/TextEnd events containing the literal string [context compacted]. Auto-recall injected <recall>...</recall> user turns flagged auto_injected: true. Both lied to the event stream — the model didn’t say those things, the daemon did. Downstream consumers had to filter them out.
  • Wasted tokens, opaque costs. Auto-titling spent an LLM call after every conversation that crossed two history entries, behind the user’s back. Auto-recall paid retrieval cost on every turn whether or not the model would have asked.
  • Race with the explicit API. All three behaviors had explicit-API counterparts (compact_conversation, the recall tool, a clearly-named title RPC if the client wanted one). The daemon was racing the client to call its own API.

RFC 0185 already drew the right line for sessions: “the runtime’s job is to provide mechanical primitives. UX decisions belong one layer up in the client.” This RFC carries that all the way through.

Design

Principle

Mechanism in the daemon, policy at the edge. Concretely:

  • Mechanism is what only the daemon can do: own conversation state, own storage, own the LLM connection, own MCP child processes, run summarization, write archives. These are inherently centralized.
  • Policy is everything else: when to compact, when to title, what to prepend to a user message, what counts as context pressure. These need information the daemon doesn’t have (which model, which UI, which user, which tradeoff matters today). Policy lives in the client — TUI, telegram, web app, headless automation — and is composed from primitives the daemon exposes.

Where this leaves heuristics: the daemon doesn’t run them. If the daemon would need to estimate something to decide, the answer is “don’t decide — surface the data and let the client decide.”

What was removed

Auto-compaction. The block in Agent::run that called self.compact(history) when estimate_tokens(history) > threshold is gone. The synthetic Compact/TextStart/TextDelta(\"[context compacted]\")/TextEnd events are gone. AgentConfig::compact_threshold is gone (silently dropped from existing TOML via serde default). HistoryEntry::estimate_tokens and the chars/4 heuristic are gone.

Auto-titling. Runtime::spawn_title_generation and its finalize_run call site are gone. The title field on Conversation and ConversationMeta stays — existing data is still valid, the daemon just doesn’t generate fresh titles on its own.

Auto-recall. Memory::before_run (the BM25-search-and-inject helper) is gone. MemoryHook::on_before_run is gone. The recall tool is unchanged — model-driven recall continues to work.

Hook::on_before_run. The trait method is removed. OsHook previously used it to inject <environment>working_directory: ...</environment> per turn — that goes too. Bash dispatch still resolves the effective cwd at tool-call time, so commands run in the right directory; the model just doesn’t get a synthetic turn telling it where it is. Clients that want the model to see the cwd put it in their own user message (they supplied it via req.cwd in the first place). The peer-agents <agents> block that DaemonHook::on_before_run injected for delegation moves to DaemonHook::on_build_agent so it lands in the system prompt at agent-build time — registry mutations are visible after the next agent rebuild.

What was added

AgentEvent::ContextUsage { usage: Usage }. Emitted once per LLM call when the provider reports non-zero usage. Carries real prompt_tokens, completion_tokens, total_tokens, plus optional cache-hit/miss and reasoning counts. The corresponding wire event is ContextUsageEvent { usage: TokenUsage }. Clients track these and decide for themselves when to call compact_conversation.

Real compact_conversation. The runtime method previously returned the summary string and silently dropped the persistence work. It now does all four steps in order: summarize → write archive entry → write session compact marker → replace history with a single user message carrying the summary. Atomic from the client’s perspective.

Reference: explicit replacements

Each removed behavior maps to an existing or planned API:

RemovedExplicit replacement
Auto-compactioncompact_conversation(agent, sender) RPC, gated on client-tracked ContextUsage events
Auto-titlingA future generate_title(conversation_id) RPC; until then, clients can run their own summarization or leave titles blank
Auto-recallThe recall tool (model-driven); or a client-side recall + send composition before the user’s message

The opt-in client-side helpers for each of these are tracked in #188 as SDK sugars — a few dozen lines on top of the daemon client.

Migration

  • New conversations have empty title until a client asks for one. Existing titles on disk are unaffected.
  • The recall tool still works. Clients that previously relied on silent <recall> injection need to either let the model call recall itself (the intended path) or compose recall + send client-side.
  • No auto-compact. Clients should subscribe to ContextUsage events and call compact_conversation when their threshold trips. The model returns an explicit error if context is exceeded — the daemon no longer guesses.
  • compact_threshold in agent TOML is silently dropped via serde default. No errors, just ignored.

Alternatives considered

Keep auto-compact as a safety net. RFC 0185 took this position: “automatic compaction on overflow as a safety net” because clients can’t see overflow coming. Rejected here because the daemon can’t reliably detect overflow either — chars/4 is the wrong tool, and the model itself returns a clear error when context is exceeded. A bad safety net is worse than none, because clients build trust in it and stop watching.

Threshold-gated ContextPressure event. Emit only when over some threshold. Rejected because it recreates the policy problem in a smaller form — the daemon still picks a number, and is still wrong for whichever model and use case it didn’t anticipate. Always-emit ContextUsage lets clients pick.

Move policy to per-agent config knobs. “Auto-compact off by default; opt in via compact_threshold.” Rejected because the per-agent config is set by the client at create-time anyway — moving the decision a step earlier doesn’t change who decides, just makes the decision harder to update. A per-call decision (the client picks each turn) is more honest.

Out of scope

Two daemon-side per-turn injections in prepare_history survive this RFC: the <instructions> block from Crab.md discovery and the guest-agent-framing prose (“Messages wrapped in <from agent=\"...\">…”). Same anti-pattern, deferred to a separate cleanup so this RFC stays focused.

Wire-protocol changes are limited to the new ContextUsageEvent and reservation of AgentInfo.compact_threshold (field 10). No breaking renumbering, no new RPCs.

0193 - Agent-Owned MCP

Summary

Agents own their MCP servers by value, not by name reference into a daemon-global registry. AgentConfig.mcps becomes Vec<McpServerConfig> — every agent carries the full configuration of every MCP it uses. The daemon’s job shrinks to “spawn what agents declare, dedup identical processes, route tool calls per agent.” Storage::{list,upsert,delete}_mcp and crabtalkd mcp go away. Forking an agent now means copying one config; the new owner gets a self-contained, runnable artifact.

Motivation

The current model treats MCPs as a daemon-level resource that agents reference by name. That made sense when crabtalk was a single-user CLI managing a fixed fleet of tools. It doesn’t fit where the runtime is going.

Forkability is broken. RFC 0135 framed agents as the unit users see and share — sessions are plumbing, agents are the artifact. Cloud workflows extend that: an agent should be a forkable thing, like a GitHub repo. Today, forking an agent’s TOML doesn’t fork its MCPs; the fork lands on a daemon that may or may not have a server registered under the same name, with the same args, with the same env. The agent reference is a dangling pointer until someone manually re-registers the missing pieces.

Namespace pollution is artificial. Two agents that want the same logical MCP with different env (e.g., one read-only token, one admin token) must register two differently-named entries in a global flat namespace. The bridge’s tool_cache: BTreeMap<String, Tool> then logs-and-skips conflicts when both expose web_search. None of that pollution is intrinsic to MCP; it’s a consequence of the registry shape.

The allowlist is a workaround for ownership. AgentConfig.mcps: Vec<String> (RFC 0082) gates which global entries an agent may dispatch to. It exists because the registry is shared. If agents own their MCPs, allowlists become tautological — the agent only dispatches to what it declared.

The cloud target makes this acute. Cloud will import crabtalk as a library and host one agent per tenant (or per agent instance). A daemon-global registry on a multi-tenant host either leaks configurations across tenants or forces the cloud layer to maintain its own per-tenant overlay on top of the registry. Either way the global registry is wrong — the right shape is “agent has its MCPs,” and the cloud’s secret/canonical layer can compose forkable templates above that.

Design

Data model

#![allow(unused)]
fn main() {
struct AgentConfig {
    // …
    mcps: Vec<McpServerConfig>,  // was Vec<String>
}
}

Embedded by value. No enum wrapper, no separate “decl” type. The agent’s TOML carries every field of every MCP it depends on.

Storage loses list_mcps, upsert_mcp, delete_mcp. The protocol RPCs ListMcps, UpsertMcp, DeleteMcp stay — they shift meaning from “manage the global registry” to “list MCPs declared by any registered agent” / “modify an agent’s MCPs in place” / “remove an MCP from an agent’s config.” Implemented by reading and writing through the agent’s config rather than a separate table.

Daemon-side dedup

The daemon never spawns the same MCP twice. Two agents declaring command="github-mcp", args=[...], env={TOKEN: "abc"} share one peer process. Different args or env → separate processes. Identity is structural, not by name.

McpHandler keys peers by fingerprint — a stable hash of (command, args, env, url). The state map becomes BTreeMap<Fingerprint, McpServerEntry> where each entry refcounts the agents that declared it. register_for_agent(agent, cfg) increments the refcount, spawning if first; unregister_for_agent(agent, fingerprint) decrements, tearing down at zero.

The lifecycle event broadcast from RFC 0190 (PR #192) still applies: Connecting / Connected / Failed / Disconnected are emitted per fingerprint, not per name. The event payload identifies the server by fingerprint plus the set of agents that own a reference to it.

Per-agent tool namespace

The bridge stops sharing a flat tool_cache. Two agents declaring different MCPs that both expose a web_search tool no longer collide — the dispatcher resolves (agent, tool_name) to the right peer through the agent’s declared fingerprints.

Concretely: McpBridge keeps the per-fingerprint peer map but drops the global tool cache. Tool lookup walks the agent’s fingerprints in declaration order and returns the first match. McpHook::dispatch already has the agent context; it now uses the agent’s declared MCPs directly instead of consulting an AgentScope.mcps allowlist.

Lifecycle interactions

  • Agent create / update. Runtime::create_agent and update_agent walk the config’s mcps list, calling McpHandler::register_for_agent(agent, cfg) for each. New fingerprints spawn; existing fingerprints just bump the refcount.
  • Agent delete. Walks the agent’s mcps, calls unregister_for_agent for each. Peers with refcount=0 are torn down. Disconnected events fire.
  • Agent rename. Refcounts move from old_name to new_name. No spawn/teardown.
  • Daemon startup. Storage rebuilds agents one by one; each register_for_agent call walks the same dedup path. No special “load global MCPs” phase.
  • Daemon reload. Already rebuilds agents (RFC 0189-era refactor). Same path. New configs trigger spawns; removed fingerprints trigger teardowns.

Where secrets are not

The daemon stores literal McpServerConfig values. There is no placeholder syntax, no resolver trait, no interpolation in this codebase. If a value looks like ${TAVILY_KEY}, the daemon spawns a process with that literal string in the environment.

The “canonical with placeholders / materialized with values” split lives in whatever sits above the daemon. Cloud’s control plane holds canonical agent configs (with ${TAVILY_KEY}), resolves against the tenant’s vault, and writes the resolved config to the daemon-as-library it owns for that tenant. Forks copy the canonical, never the resolved.

This keeps the forkability invariant — shareable artifacts carry structure, not values — while keeping the daemon secret-unaware.

Migration

AgentConfig.mcps is a breaking field type change (Vec<String>Vec<McpServerConfig>). Existing configs on disk need a one-shot migration:

  1. On daemon startup, if any agent’s mcps is Vec<String> (detected via serde), look each name up in the existing mcps.toml (or whatever Storage held the global registry), inline the McpServerConfig, and rewrite the agent’s TOML.
  2. After every agent has been migrated, delete the global mcps.toml.

The migration runs once. After the first startup on the new code, configs are uniformly the new shape; the migration code path is dead and gets removed in a follow-up cleanup commit.

Storage::list_mcps / upsert_mcp / delete_mcp are removed from the trait. Implementations — FsStorage, MemStorage — drop the corresponding files/fields. The protocol RPCs ListMcps / UpsertMcp / DeleteMcp stay on the wire; their handlers are rewritten to operate on agent configs.

AgentScope.mcps (RFC 0082) is removed. The scoping struct still gates tools and skills; MCP scoping is now intrinsic to the agent’s declaration.

Alternatives considered

Keep the global registry, add per-agent overrides. Allow AgentConfig.mcps to carry inline overrides on top of name references. Rejected because it doubles the configuration surface — every consumer has to handle “which wins, the override or the registry?” — without solving forkability. Forking an agent still depends on the destination daemon having the right names registered.

SecretResolver trait in this repo. Earlier draft. Cut because the daemon can stay secret-unaware: cloud handles canonical-vs-resolved at its control plane and only writes resolved configs into the daemon. Adding a trait here for a default that just reads env vars is complexity for a problem we don’t have.

Generic on Daemon for the resolver. Even if a resolver lived in this repo, adding a second type parameter to Daemon<P> compounds complexity per the no-generics-for-future-use rule. Not worth it for a hypothetical hook.

Package-provided MCPs as agent templates. Package install/uninstall lives in crabup, not the daemon, so this collapses. Future package-like artifacts compose at the agent level rather than at a separate MCP-registry level.

Out of scope

  • Secret resolution, vaulting, or ${VAR} interpolation. Cloud’s problem, not the daemon’s.
  • Auto-restart behavior for failed peers. Lifecycle events from PR #192 surface failures; whether a client retries is a client decision.
  • Discovery of port-file MCPs. Today McpHandler auto-connects services that drop a *.port file under ~/.crabtalk/run/. That mechanism continues to work, but discovered servers now register against a synthetic per-process “discovery agent” (or are exposed only on the daemon-internal dispatch path) — the exact shape is a follow-up.
  • Package MCPs. Package install lives in crabup; no daemon-side migration needed.

Superseded RFCs

RFCs that have been replaced by newer designs. Kept for historical reference.