Introduction
This is the crabtalk development book — the knowledge base you check before building. It captures what crabtalk stands for, how the system is shaped, and the design decisions that govern its evolution.
For user-facing documentation (installation, configuration, commands), see crabtalk.ai.
How this book is organized
- Manifesto — What crabtalk is and what it stands for.
- RFCs — Design decisions and features.
RFCs
Code tells you what the system does. Git history tells you when it changed. RFCs tell you why — the problem, the alternatives considered, and the reasoning behind the choice. When you’re about to build something new, RFCs are where you check whether the problem has been thought through before.
Not every change needs an RFC. Bug fixes, refactors, and small improvements go through normal pull requests. RFCs are for decisions that establish rules, contracts, or interfaces that others need to know about before building.
Format
Each RFC is a markdown file with the following structure:
- Header — Feature name, start date, link to discussion, affected crates.
- Summary — One paragraph describing the decision.
- Motivation — What problem does this solve? What use cases does it enable?
- Design — The technical design. Contracts, responsibilities, interfaces.
- Alternatives — What else was considered and why it was rejected.
- Unresolved Questions — Open questions for future work.
Lifecycle
- Open an issue on GitHub describing the feature or design problem.
- Implement it. Iterate through PRs until it’s merged.
- Once merged, write the RFC documenting the decision and add it to
SUMMARY.md.
The RFC number is the issue number or the PR number that introduced the feature. RFCs are written after implementation, not before — they record decisions that were made, not proposals for decisions to come.
Manifesto
Ownership is necessary for an open agent ecosystem.
Ownership is not configuration. A configured agent is one where you picked from someone else’s menu. An owned agent is one where you decided what’s on the menu. Ownership is the power to compose your own stack.
Every agent application today rebuilds session management, command dispatch, and event streaming from scratch — then bundles it alongside search, browser automation, PDF parsing, TTS, image processing, and dozens of tools you didn’t ask for into one process. If you want a Telegram bot with search, you carry nineteen other channels and every integration. If you want a coding agent, you carry TTS and image generation. The process is theirs. The choices are theirs. You run it.
This happens because the daemon layer is missing. Without it, every application must become the daemon. And a daemon that is also an application ships its opinion of what your agent should be.
CrabTalk is that daemon layer. It manages sessions, dispatches commands, and streams the full execution lifecycle to your client. It does not bundle search. It does not bundle gateways. It does not bundle tools. You put what you need on your PATH. They connect as clients. They crash alone. They swap without restarts. The daemon never loads them.
An agent daemon is not an agent application. An agent daemon empowers you to build the application you want — and only the application you want. This is the essence of ownership.
We cannot expect agent platforms to give us ownership out of their beneficence. It is to their advantage to bundle, to lock in, to ship their choices as yours. We should expect that they will bundle. The only way to preserve choice is to never take it away in the first place.
We don’t much care if you prefer a batteries-included experience. You could build an OpenClaw-like assistant or a Hermes-like agent on top of CrabTalk. You can’t build a CrabTalk underneath them. The daemon must come first. The architecture must be right. Everything else follows.
Let us proceed.
Conversations
A conversation is the unit of agent interaction. It holds the message history an agent uses as working context, together with the state associated with that history.
Identity
A conversation is identified by the pair (agent, sender).
agentis the name of an agent configured in the daemon.senderis a client-provided string identifying the counterparty. Clients choose their own convention, such as"user","tg:12345", or"delegate:42".
The pair is the conversation’s only externally addressable name. The wire protocol carries no conversation identifier.
Lifetime
A conversation is created on first reference to a pair (agent, sender) that does not yet exist, and persists across daemon restarts. Persistence is delegated to the configured Storage backend.
At most one conversation exists for any given (agent, sender) pair.
Addressing
Protocol messages that operate on a conversation carry agent and sender fields. The pair resolves to the conversation on which the operation acts.
| Message | Effect |
|---|---|
StreamMsg | Append user content, run the agent, stream the response. |
KillMsg | Cancel the in-flight run, if any. |
CompactMsg | Compact the current history into an archive (see Memory). |
ReplyToAsk | Supply content for a pending ask_user call. |
StreamMsg.sender is optional. When omitted, the daemon resolves a default sender determined by the transport.
State
A conversation holds:
- History — an ordered sequence of history entries.
- Title — a short human-readable label assigned by the
set_titletool. - Working directory — the filesystem path used by OS-level tools during a run.
- Archives — compacted prefixes of the history (see Memory).
History ordering is total. New entries are appended; no entry is reordered or removed except through compaction.
Working directory
Each conversation has a default working directory. StreamMsg.cwd, when set, overrides the default for the duration of the resulting run. The override does not modify the conversation’s default.
Message attribution
Each assistant message in the history carries an agent field.
- An empty
agentfield denotes a message produced by the conversation’s primary agent, the one named by the conversation’s identity. - A non-empty
agentfield denotes a guest turn (see Multi-agent).
Messages produced by the daemon for protocol framing are marked as auto-injected and stripped from the history before each run.
Dispatch
The daemon accepts client messages on its transports and produces a stream of server messages in response. Each message is handled independently, with no central event loop mediating between the transport and the operations.
Entry point
Every transport (UDS, TCP, future additions) feeds ClientMessage values into the same dispatch callback. The callback spawns a Tokio task per message and polls the resulting stream, forwarding each ServerMessage back to the transport’s reply channel. When the stream ends or the reply channel closes, the task terminates.
Concurrency is unbounded at this layer: nothing throttles or serializes incoming messages before they reach their handler.
Dispatch function
Server::dispatch(ClientMessage) -> Stream<ServerMessage> is the single entry into the daemon’s operations. It inspects the ClientMessage variant and routes to the corresponding method on the Server trait.
- Request-response operations (
ping,kill_conversation,compact_conversation, administrative calls) yield exactly oneServerMessage. - Streaming operations (
stream,subscribe_events) yield manyServerMessagevalues over time. - Unknown or empty messages yield a single error response.
The function is defined once in the core Server trait. Any implementor — the daemon, a test harness, a future alternative server — routes client messages the same way.
No central event loop
There is no serializing queue, no DaemonEvent enum, and no actor that owns mutation. Operations reach into shared state directly and hold locks for the duration of the critical section.
Shared state is protected by parking_lot::Mutex or parking_lot::RwLock. Event bus subscriptions, conversation working-directory overrides, pending ask_user replies, and cron state each live behind their own lock. Locks are acquired, the work is done, and the lock is released. Ordering between operations is whatever Tokio’s scheduler produces.
Ordering guarantees
Within a single conversation, message ordering is total: StreamMsg appends to history in the order the daemon receives them. Clients that require strict ordering for a conversation are responsible for serializing their own sends.
Between conversations, no ordering is guaranteed. Two StreamMsg values addressed to different (agent, sender) pairs may run in either order regardless of arrival time.
Cancellation
KillMsg cancels the in-flight run for its (agent, sender) pair. Cancellation propagates through the runtime to the active agent step, interrupting tool calls and LLM requests at the next await point. Already-emitted ServerMessage values are not retracted.
A cancelled conversation remains valid. The next StreamMsg for the same pair resumes against the history as it existed at the point of cancellation.
Event bus
The event bus is a subscription table, not a router. publish(source, payload) iterates subscriptions, invokes the fire callback for each match inline, and removes any subscription marked once. The callback fires under the bus’s lock; implementations must not reacquire it.
The bus has no queue and no scheduler. Fan-out is as fast as the callback runs for each matching subscription.
Multi-agent
Multi-agent conversations let a second agent speak into an existing conversation as a guest. A guest turn is a first-class message from the guest agent; it is not a tool call, a delegation, or a paraphrase.
Guest turns
A guest turn runs a named guest agent against the primary conversation’s history and appends the guest’s response to that history. The primary agent of the conversation is unchanged.
A guest turn is requested by setting StreamMsg.guest to the name of the guest agent. The conversation is still addressed by the primary’s (agent, sender) pair; guest selects who speaks on this turn, not whose conversation it is.
Flow
When StreamMsg { agent: A, sender: S, guest: G, content: C } is dispatched:
- The conversation
(A, S)is resolved, creating it if necessary. - The user content
Cis appended to the history. - The daemon runs agent
Gagainst the history usingG’s system prompt and instructions. - The response is appended to the history, tagged with
agent: G.
The primary agent is not invoked on a guest turn. A subsequent StreamMsg without guest resumes normal operation with the primary agent against the updated history.
Tools on guest turns
A guest turn is text-only. The guest agent’s tool schemas are not attached to the request, and any tool call emitted by the guest is rejected.
Tool-using work belongs to the primary agent. A guest is a voice in the conversation, not a worker.
Attribution
Each message in the history carries an agent field.
agentempty — the message originates from the conversation’s primary agent.agentnon-empty — the message originates from a guest. The value is the guest agent’s name.
Attribution survives compaction: archive entries preserve the agent field of each archived message.
Framing
When building a request, the runtime auto-injects framing messages that are not persisted between runs. Two framings exist:
- Guest framing. Injected when a guest is running. It tells the guest that it is joining a conversation and explains the
<from agent="...">tag convention. - Primary framing. Injected when the primary is running and the history contains at least one message with a non-empty
agent. It tells the primary that some messages are from guest agents and it should continue responding as itself.
Framing messages are marked auto-injected. They are stripped from the history at the start of each run and re-injected for that run only. The history on disk never contains framing messages.
Tagging
Assistant messages with a non-empty agent field are prefixed with <from agent="{name}"> when they appear in an LLM request. The prefix makes the speaker visible to whichever agent is currently reading the history.
A message without an agent field carries no prefix.
Cancellation
KillMsg addresses the conversation by (agent, sender). It cancels whichever run is in flight, whether that run is the primary or a guest. A cancelled guest turn leaves the user’s content appended to the history; the guest’s partial response is discarded.
Memory
Memory is a single-file entry store, shared by an agent across its conversations. It holds two kinds of content: notes that the agent writes deliberately, and archives that accumulate as conversations are compacted. Search is lexical (BM25); there are no embeddings.
Entries
An entry has:
id— monotonic integer, assigned on insert.name— the entry’s primary identifier. Unique within the memory.aliases— alternative names that resolve to the same entry.content— the entry’s text.kind—NoteorArchive.created_at— creation timestamp.
Entries are addressed by name or by any of their aliases. A name is rebindable through aliasing; the canonical name is whatever the agent most recently chose.
Kinds
Note entries are the agent’s long-term store. The agent adds, renames, aliases, and rewrites them through memory operations.
Archive entries are produced by compaction. Their content is the summary of a compacted conversation prefix. Archive entries are not rewritten after creation.
Both kinds share the same index and search path. A search over memory returns both, ranked by relevance.
Compaction
Compaction compresses a prefix of a conversation’s history into a summary and records a boundary in the history at the point of compression.
When a conversation is compacted:
- The daemon summarizes the history prefix.
- The summary is written to the memory as an
Archiveentry with a generatedname. - A compact marker is appended to the conversation’s history, carrying the
archive_nameandarchived_attimestamp.
On the next run, the history is replayed from the latest compact marker. Entries before the marker are dropped from the working context; the archive remains available through memory search and by explicit name.
A conversation can be compacted any number of times. Each compaction leaves one additional marker and one additional archive entry.
Persistence
The memory is a single file. The file holds all entries, all aliases, and the search index snapshot. A write operation mutates memory in RAM and writes an atomic snapshot of the file on each successful apply.
Opening an existing path reads the snapshot into RAM. Opening a non-existent path creates an empty memory; the file is written on the first successful apply.
Search
Search is BM25 over the tokenized content and name of each entry. Results include the entry and its score. The caller chooses the cutoff — the store does not filter by relevance.
The token set is the union of tokens from content and name; aliases do not contribute tokens. Aliases are resolution, not search.
Operations
Memory exposes a closed set of write operations:
| Operation | Effect |
|---|---|
Add | Create a new entry with a given name, content, and kind. |
Rename | Change an entry’s canonical name. |
Alias | Bind an additional name to an existing entry. |
Write | Replace an entry’s content. |
Remove | Delete an entry and all its aliases. |
Operations on Archive entries are permitted but not expected; the agent works with Note entries.
Runtime
The runtime is the engine that drives agents. It owns conversations in memory, runs agent steps, dispatches tool calls, and applies compaction. It does not open sockets, accept connections, or schedule time. Capabilities that require I/O are provided to the runtime by its environment.
Composition
A runtime is parameterized by a Config that names three associated types:
| Type | Responsibility |
|---|---|
Storage | Persistence of conversations, skills, and memory. |
Provider | LLM request and streaming. |
Env | Node-specific capabilities and tool dispatch. |
A binary supplies one Config. The daemon’s Config wires filesystem storage, a configured provider, and a node environment that owns hooks and event broadcasting. Tests supply a Config with in-memory storage, a stub provider, and () as the environment.
Responsibilities
The runtime handles:
- Loading and saving conversations through
Storage. - Building an agent request from the current history, instructions, and tool schemas.
- Streaming responses from
Providerand applying them to the conversation. - Dispatching tool calls through
Env. - Emitting
AgentEventvalues for each step, tool call, and compaction. - Producing compaction summaries and appending archive markers.
Boundary
The runtime does not:
- Bind listeners or accept transport connections.
- Spawn tasks for message routing or scheduling.
- Interpret protocol messages.
- Read the system clock for scheduling purposes.
- Manage process state such as PID files or signals.
These belong to the server that hosts the runtime.
Env
Env is the runtime’s only outward-facing capability surface. It provides:
hook()— the compositeHookthat exposes tool schemas, dispatches tool calls, and participates in lifecycle events.on_agent_event(agent, conversation_id, event)— hook point for side effects, such as event broadcasting or persistence of step traces.subscribe_events()— optional subscription to a cross-conversation event stream, for servers that expose agent events to external clients.discover_instructions(cwd)— collect instruction files applicable to a working directory.effective_cwd(conversation_id)— resolve the working directory for a run, honoring any per-conversation override.
Methods that the runtime does not need in a given context have default implementations. An Env implementation may leave event broadcasting, instruction discovery, or CWD management at their defaults.
Hook
Hook is the single point through which the runtime reaches node-specific tools. A hook:
- Advertises tool schemas for the LLM request.
- Dispatches tool calls by name, returning a future that yields the tool’s result.
- Participates in step lifecycle, observing starts, completions, and errors.
A hook is composite: the daemon’s hook owns sub-hooks (OS tools, ask_user, delegation, event subscription, memory). Order of sub-hooks is fixed by the composite; the runtime sees a single Hook.
Tool dispatch
A tool call from the agent carries the tool name, arguments, the originating agent and sender, and the conversation id. The runtime invokes Env::hook().dispatch(name, call). If no sub-hook claims the name, the dispatch yields an error result; the agent receives the error as the tool’s output.
Dispatch is asynchronous. The runtime awaits the tool future at the next step boundary and applies the result to the conversation before the following step.
Daemon
The daemon is the long-lived process that hosts the runtime, owns transports, and persists state. Clients are transient; the daemon is not. A single daemon process serves all configured agents, all active conversations, and all connected clients.
Responsibilities
The daemon owns:
- Transports — UDS and TCP listeners. Listening endpoints belong to the daemon, not to individual clients or agents.
- Runtime — a single shared runtime instance behind
RwLock. Agents share the runtime; the runtime is never cloned per conversation. - Hooks — the composite
Hookassembled from sub-hooks (OS tools,ask_user, delegation, event subscription, memory). - Event bus — subscription table and fire callback. File-backed by
events/subscriptions.tomlunder the config directory. - MCP handler — connections to external MCP servers and routing to the tools they advertise.
- Configuration — current
DaemonConfig, reloaded in place on explicit reload.
The daemon does not interpret tool semantics. Tool dispatch is the runtime’s responsibility, routed through the composite hook.
Process model
The daemon runs as a single OS process. All work happens on a single Tokio runtime. There is one listener task per configured transport, one reply task per connected client, and one task per in-flight dispatch. Shutdown is initiated by a broadcast channel; every long-lived task subscribes and exits when the channel fires.
A daemon process owns at most one configuration directory and at most one set of transport endpoints.
Config directory
The daemon is rooted at a configuration directory supplied at startup. The directory holds:
| Path | Contents |
|---|---|
config.toml | Node configuration. |
agents/ | Agent definitions. |
sessions/ | Conversation JSONL logs, one file per conversation. |
memory/ | Per-agent memory databases, one file per agent. |
skills/ | Skill bundles loadable by agents. |
events/subscriptions.toml | Event subscription recovery file. |
All paths are resolved relative to the configuration directory. The daemon writes nothing outside this directory.
Lifecycle
Startup. The daemon reads config.toml, constructs the provider, assembles hooks, opens storage, builds the shared runtime, loads event subscriptions from disk, binds transports, and begins accepting client messages.
Runtime. The daemon serves the Server trait. Each client message is dispatched into a spawned task that produces a stream of server messages.
Reload. A ReloadMsg causes the daemon to re-read config.toml and rebuild the shared runtime in place. Existing in-flight dispatches complete against the previous runtime; new dispatches see the reloaded runtime. Transports are not re-bound.
Shutdown. The daemon broadcasts a shutdown signal. Transport listeners stop accepting new connections. Active dispatches complete or cancel at the next await point. The daemon writes no final state on shutdown; state is persisted on each mutating operation, not at exit.
Persistence boundary
The daemon persists state through the Storage trait. Operations that mutate conversations, memory, or agent definitions write synchronously through storage before acknowledging the caller. Cron and event subscription files are written directly by the daemon.
A daemon restart recovers all state from the config directory. No state is held only in the process.
Client addressing
Clients do not address the daemon. Clients connect to a transport and send ClientMessage values. The transport’s reply channel delivers ServerMessage values back until the connection closes. A client that reconnects and addresses the same (agent, sender) pair resumes the same conversation; no client-side resume token is required.
Providers
Providers are the sole point of contact between the daemon and an LLM. The provider layer is external: its trait, types, and concrete implementations live upstream in crabllm. Crabtalk consumes providers but does not define them.
Boundary
The crabllm-core crate defines the Provider trait and the shared types that flow across it: ChatCompletionRequest, Message, Tool, ToolCall, Role, Usage, ApiError. These types are the contract between crabtalk and any LLM backend.
The crabllm-provider crate defines concrete provider implementations. ProviderRegistry assembles them and yields one Provider value constructed from the node configuration.
Crabtalk depends on both crates as external dependencies. It does not vendor provider code. Changes to provider internals — authentication, request formatting, streaming, error decoding, retry policy — are made upstream.
Usage
A runtime is parameterized by Config::Provider. The daemon’s default config resolves Provider by calling ProviderRegistry::build with the user’s configuration. The runtime holds a single provider instance for its lifetime and calls it once per agent step.
The provider is asked to produce:
- A non-streaming completion for synchronous operations.
- A streaming completion for
StreamMsgoperations, yielding chunks that the runtime accumulates into aMessage.
The runtime does not interpret provider-specific errors. ApiError is surfaced to the client as a protocol error; the provider is responsible for mapping backend failures into ApiError values.
Tools across the boundary
Tool schemas are declared in crabllm-core::Tool. The runtime collects schemas from the composite hook, attaches them to the request, and lets the provider format them for the backend. Tool calls returned by the provider arrive as ToolCall values; the runtime dispatches each call through Env::hook().dispatch.
The shape of tool schemas is fixed by crabllm-core. A tool that cannot be expressed in that shape is not expressible to crabtalk.
Configuration
Provider configuration is read from the node’s config.toml and passed to ProviderRegistry. The daemon does not inspect provider-specific configuration; it forwards the relevant sections to the registry and accepts the resulting Provider.
Adding a new backend is a change to crabllm-provider. It is not a change to crabtalk.
Upstream
crabllm is maintained at crabtalk/crabllm. Bug fixes, new backends, and trait changes are filed there. Crabtalk upgrades its crabllm dependency on release.
0009 - Transport
- Feature Name: UDS and TCP Transport Layers
- Start Date: 2026-03-27
- Discussion: #9
- Crates: transport, core
Summary
A transport layer providing Unix domain socket (UDS) and TCP connectivity
between clients and the crabtalk daemon, built on a shared length-prefixed
protobuf codec defined in core.
Motivation
The daemon needs to accept connections from local CLI clients and remote clients (Telegram, web gateways). UDS is the natural choice for same-machine communication — no port management, filesystem-based access control. TCP is required for remote access and cross-platform support (Windows has no UDS).
Both transports share identical framing and message types. The codec and message
definitions belong in core so that any transport can use them without
depending on each other. The transport crate provides the concrete connection
machinery.
Design
Codec (core::protocol::codec)
Wire format: [u32 BE length][protobuf payload]. The length prefix counts
payload bytes only, excluding the 4-byte header itself.
Two generic async functions operate over any AsyncRead/AsyncWrite:
write_message<W, T: Message>(writer, msg)— encode, length-prefix, flush.read_message<R, T: Message + Default>(reader)— read length, read payload, decode.
Maximum frame size is 16 MiB. Frames exceeding this limit produce a
FrameError::TooLarge. EOF during the length read produces
FrameError::ConnectionClosed (clean disconnect, not an error).
Server accept loop
Both UDS and TCP servers share the same pattern:
accept_loop(listener, on_message, shutdown)
listener—UnixListenerorTcpListener.on_message: Fn(ClientMessage, Sender<ServerMessage>)— called for each decoded client message. The sender is per-connection; the callback can send multipleServerMessages (streaming responses) or exactly one (request-response). The channel is unbounded because messages are small and flow-controlled by the protocol — the agent produces responses at LLM speed, far slower than socket drain speed.shutdown—oneshot::Receiver<()>for graceful stop.
Each accepted connection spawns two tasks: a read loop that decodes
ClientMessages and calls on_message, and a send task that drains the
UnboundedSender and writes ServerMessages back. When the read loop ends
(EOF or error), the sender is dropped, which terminates the send task.
TCP specifics
- Default port:
6688. If the port is in use, bind fails — another daemon may already be running. TCP_NODELAYis set on all connections (low-latency interactive protocol).bind()returns astd::net::TcpListener(non-blocking).
UDS specifics
- Unix-only (
#[cfg(unix)]). - Socket path is caller-provided (typically
~/.crabtalk/daemon.sock). - No port management or collision handling — the filesystem path is the identity.
Client trait (core::protocol::api::Client)
Two required transport primitives:
request(ClientMessage) -> Result<ServerMessage>— single round-trip.request_stream(ClientMessage) -> Stream<Item = Result<ServerMessage>>— send one message, read responses until the stream ends.
Both UDS Connection and TCP TcpConnection implement Client identically:
split the socket into owned read/write halves, write via codec, read via codec.
The request_stream implementation reads indefinitely; typed provided methods
on Client (e.g., stream()) handle sentinel detection (StreamEnd).
Connections are not Clone — one connection per session. The client struct
(CrabtalkClient / TcpClient) holds config and produces connections on
demand.
Alternatives
tokio-util LengthDelimitedCodec. Would save the manual length-prefix
code but adds a dependency for ~50 lines of straightforward framing. The
hand-rolled codec is simpler to audit and has no extra allocations.
gRPC / tonic. Full RPC framework with HTTP/2 transport. Heavyweight for a
local daemon protocol. The current design is simpler: raw protobuf over a
length-prefixed stream, no HTTP layer, no service definitions beyond the
Server trait.
Shared generic transport trait. UDS and TCP accept loops are nearly
identical but kept as separate modules. A generic Transport trait would save
~20 lines of duplication but add an abstraction with exactly two implementors.
Not worth it.
Unresolved Questions
- Should the transport support TLS for TCP connections in non-localhost deployments?
- Should there be a connection timeout or keepalive at the transport level, or
is the protocol-level
Ping/Pongsufficient?
0018 - Protocol
- Feature Name: Wire Protocol
- Start Date: 2026-03-27
- Discussion: #18
- Crates: core
Summary
A protobuf-based wire protocol defining all client-server communication for the
crabtalk daemon, with a Server trait for dispatch and a Client trait for
typed request methods.
Motivation
The daemon mediates between multiple clients (CLI, Telegram, web) and multiple
agents. A well-defined wire protocol decouples client and server implementations
and makes the contract explicit. Protobuf was chosen for compact binary
encoding, language-neutral schema, and generated code via prost.
Design
Wire messages (crabtalk.proto)
Two top-level envelopes using oneof:
ClientMessage — 15 variants:
| Variant | Purpose |
|---|---|
Send | Run agent, return complete response |
Stream | Run agent, stream response events |
Ping | Keepalive |
Sessions | List active sessions |
Kill | Close a session |
GetConfig | Read daemon config |
SetConfig | Replace daemon config |
Reload | Hot-reload runtime |
SubscribeEvents | Stream agent events |
ReplyToAsk | Answer a pending ask_user prompt |
GetStats | Daemon stats |
CreateCron | Create cron entry |
DeleteCron | Delete cron entry |
ListCrons | List cron entries |
Compact | Compact session history |
ServerMessage — 11 variants:
| Variant | Purpose |
|---|---|
Response | Complete agent response |
Stream | Streaming event (see below) |
Error | Error with code and message |
Pong | Keepalive ack |
Sessions | Session list |
Config | Config JSON |
AgentEvent | Agent event (for subscriptions) |
Stats | Daemon stats |
CronInfo | Created cron entry |
CronList | All cron entries |
Compact | Compaction summary |
Streaming events
StreamEvent is itself a oneof with 8 variants representing the lifecycle of
a streamed agent response:
Start { agent, session }— stream opened.Chunk { content }— text delta.Thinking { content }— thinking/reasoning delta.ToolStart { calls[] }— tool invocations beginning.ToolResult { call_id, output, duration_ms, is_error }— single tool result.is_errorsignals the handler reported failure;outputcarries the text in either case so clients can render it. UIs use the flag to style errors distinctly; agents can use it for retry decisions without string-matching on error messages.ToolsComplete— all pending tool calls finished.AskUser { questions[] }— agent needs user input.End { agent, error }— stream closed (error is empty on success).
The client reads StreamEvents until it receives End, which is the terminal
sentinel.
Tool result ordering. When a single agent step produces N tool calls, the runtime dispatches them concurrently and emits ToolResult events in completion order — fast tools are reported as soon as they finish, slow siblings report later. The event stream is therefore not ordered by the call index in ToolStart.calls[]. Clients correlate by call_id, which is the primary key; do not assume positional alignment with the ToolStart call list.
Agent events
AgentEventMsg carries a kind enum (TEXT_DELTA, THINKING_DELTA,
TOOL_START, TOOL_RESULT, TOOLS_COMPLETE, DONE) plus agent name, session
ID, content, and timestamp. Used by SubscribeEvents for live monitoring of all
agent activity across sessions. For TOOL_RESULT events, the tool_is_error field mirrors the streaming protocol’s is_error — monitoring clients use it to aggregate error rates per tool type without parsing output strings.
AgentEventMsg overlaps with StreamEvent — both represent the agent execution
lifecycle. StreamEvent is the per-request streaming format (rich, typed
variants). AgentEventMsg is the cross-session monitoring format (flat, single
struct with a kind tag). The duplication exists because monitoring clients need a
simpler, uniform shape to filter and display events from multiple agents.
Server trait
One async method per ClientMessage variant. Implementations receive typed
request structs and return typed responses:
#![allow(unused)]
fn main() {
trait Server: Sync {
fn send(&self, req: SendMsg) -> Future<Output = Result<SendResponse>>;
fn stream(&self, req: StreamMsg) -> Stream<Item = Result<StreamEvent>>;
fn ping(&self) -> Future<Output = Result<()>>;
// ... one method per operation
}
}
The provided dispatch(&self, msg: ClientMessage) -> Stream<Item = ServerMessage> method routes a raw ClientMessage to the correct handler.
Request-response operations yield exactly one ServerMessage; streaming
operations yield many. Errors are mapped to ErrorMsg { code, message } using HTTP status codes with
their standard semantics: 400 (bad request), 404 (not found), 500 (internal
error).
Client trait
Two required transport primitives:
request(ClientMessage) -> Result<ServerMessage>— single round-trip.request_stream(ClientMessage) -> Stream<Item = Result<ServerMessage>>— raw streaming read.
Typed provided methods (send, stream, ping, get_config, set_config)
handle message construction, response unwrapping, and sentinel detection. The
stream() method consumes events via take_while until StreamEnd and maps
each frame through TryFrom<ServerMessage> for type-safe event extraction.
Conversions (message::convert)
From impls wrap typed messages into envelopes (SendMsg -> ClientMessage,
SendResponse -> ServerMessage). TryFrom impls unwrap in the other direction,
returning an error for unexpected variants. This keeps call sites clean — no
manual enum construction.
Alternatives
JSON over WebSocket. Simpler to debug with curl, but larger payloads and
no schema enforcement. Protobuf catches schema mismatches at compile time.
gRPC service definitions. Would provide streaming and code generation out of the box, but brings HTTP/2, tower middleware, and tonic as dependencies. The current approach is lighter: raw protobuf frames over a length-prefixed stream, with hand-written trait dispatch.
Separate request/response ID correlation. The protocol is connection-scoped and sequential — one outstanding request per connection at a time. This is a fundamental design constraint: clients must wait for a response before sending the next request. No need for request IDs or multiplexing. If multiplexing is needed later, it belongs in the transport layer, not the protocol.
Unresolved Questions
- Should the protocol negotiate a version on connect to detect client/server mismatches?
- Should
StreamEndcarry structured error information (code + message) instead of a plain string? - Should there be a
ClientMessagevariant for subscribing to a specific session’s events rather than all events?
0027 - Model
- Feature Name: Model Abstraction Layer
- Start Date: 2026-01-25
- Discussion: #27
- Crates: model, core
Summary
A provider registry that wraps multiple LLM backends (OpenAI, Anthropic, Google,
Bedrock, Azure) behind a unified Model trait, with per-model provider
instances, runtime model switching, and retry logic with exponential backoff.
Motivation
The daemon talks to LLMs. Which LLM, from which provider, through which API —
that’s configuration, not architecture. The agent code should call model.send()
and not care whether it’s hitting Anthropic directly or an OpenAI-compatible
proxy.
This requires:
- A single trait that all providers implement.
- A registry that maps model names to provider instances.
- Runtime switching between models without restarting.
- Retry logic for transient failures (rate limits, timeouts).
- Type conversion between crabtalk’s message types and each provider’s wire format.
Design
Model trait (core)
Defined in wcore::model:
#![allow(unused)]
fn main() {
pub trait Model: Clone + Send + Sync {
async fn send(&self, request: &Request) -> Result<Response>;
fn stream(&self, request: Request) -> impl Stream<Item = Result<StreamChunk>>;
fn context_limit(&self, model: &str) -> usize;
}
}
The trait is in core because agents are generic over Model. The implementation
lives in the model crate.
Provider
Wraps crabllm_provider::Provider (the external multi-backend LLM library)
behind the Model trait. Each Provider instance is bound to a specific model
name and carries:
- The backend connection (OpenAI, Anthropic, Google, Bedrock, Azure).
- A shared HTTP client.
- Retry config:
max_retries(default 2) andtimeout(default 30s).
Base URL normalization strips endpoint suffixes (/chat/completions,
/messages) so both bare origins and full paths work in config.
ProviderRegistry
Implements Model by routing requests to the correct provider based on the
model name in the request.
ProviderRegistry
├── providers: BTreeMap<String, Provider> # keyed by model name
├── active: String # default model
└── client: reqwest::Client # shared across providers
- Construction: one
ProviderDefcan list multiple model names. Each gets its ownProviderinstance. Duplicate model names across definitions are rejected at validation time. - Routing:
send()andstream()look up the provider byrequest.model. Callers get a clone of the provider — the registry lock is not held during LLM calls. - Switching:
switch(model)changes the active default. Agents can still override per-request via the model field. - Hot add/remove: providers can be added or removed at runtime without rebuilding the registry.
Retry logic
Non-streaming send() retries transient errors (rate limits, timeouts) with
exponential backoff and full jitter:
- Initial backoff: 100ms, doubling each retry.
- Jitter: random duration in
[backoff/2, backoff]. - Max retries: configurable per provider (default 2).
- Non-transient errors (auth failures, invalid requests) fail immediately.
Streaming does not retry — the connection is already established.
Type conversion
A convert module translates between wcore::model types (Request, Response,
Message, StreamChunk) and crabllm_core types (ChatCompletionRequest,
ChatCompletionResponse). This isolates the external library’s types from the
rest of the codebase.
Alternatives
Direct provider calls without a registry. Each agent holds its own provider. Rejected because runtime model switching and centralized configuration require a shared registry.
Trait objects instead of enum dispatch. Box<dyn Model> instead of the
concrete Provider enum. Rejected because Model has generic return types
(impl Stream) that prevent object safety. The enum dispatch via
crabllm_provider::Provider handles this naturally.
Unresolved Questions
- Should the registry support fallback chains (try provider A, fall back to B)?
- Should streaming requests retry on connection failures before the first chunk?
0036 - Skill Loading
- Feature Name: Skill Loading
- Start Date: 2026-03-27
- Discussion: #36
- Crates: runtime
Summary
How crabtalk discovers, loads, dispatches, hot-reloads, and scopes skills. The skill format follows the agentskills.io convention — this RFC covers the loading mechanism, not the format.
Motivation
Agents need extensible behavior without recompilation. Skills are the simplest unit that works: a markdown file with a name, description, and a prompt body. No code generation, no plugin API, no runtime linking.
The format is defined by agentskills.io. What crabtalk needs to decide is how skills are found on disk, how they’re resolved at runtime, how they stay current without restarts, and how agents are restricted to subsets of available skills.
Design
Format
SKILL.md follows the agentskills.io convention.
Required fields: name, description. Optional: allowed-tools. The markdown
body is the skill prompt.
Discovery
SkillHandler::load(dirs) scans a list of directories (in config-defined order)
recursively for SKILL.md files. Each skill lives in its own directory:
skills/
check-feeds/
SKILL.md
summarize/
SKILL.md
Nested organization is supported (skills/category/my-skill/SKILL.md). Hidden
directories (.-prefixed) are skipped. Duplicate names across directories are
detected and skipped with a warning — first-loaded wins, in config-defined
directory order.
Registry
A Vec<Skill> wrapped in Mutex inside SkillHandler. Linear scan — the
registry is small enough that indexing is unnecessary. Supports add, upsert
(replace by name), contains, and skills (list all).
Dispatch
Exposed as a tool the agent can call. Input: { name: string }.
Resolution order:
- Scope check — if the agent has a skill scope and the name is not in it, reject.
- Path traversal guard — reject names containing
..,/, or\. - Exact load from disk — for each skill directory, check
{dir}/{name}/SKILL.md. If found, parse it, upsert into the registry, return the body. - Fuzzy fallback — if no exact match, substring search the registry by name and description. If input is empty, list all available skills (respecting scope).
Hot reload
The upsert on exact load (step 3) is the hot-reload mechanism. When a skill is invoked, it’s always loaded fresh from disk and upserted into the registry. Skills can be updated on disk and picked up on next invocation without daemon restart.
Slash command resolution
Before a message reaches the agent, preprocess resolves leading /skill-name
commands. For each skill directory, it checks {dir}/{name}/SKILL.md. If found,
the skill body is wrapped in a <skill> tag and injected into the message. This
happens before tool dispatch — it’s prompt injection, not a tool call.
Scoping
Agents can be restricted to a subset of skills via AgentScope.skills. If
non-empty, only listed skills are available. Empty means unrestricted. Scoping
applies to both exact load, fuzzy listing, and slash resolution.
Alternatives
Code-based plugins (dylib / WASM). Far more powerful but far more complex. Skills are prompt injection, not code execution. The simplicity of markdown files is the point.
Database-backed registry. Adds persistence complexity for a registry that rebuilds in milliseconds from disk. Not needed.
Unresolved Questions
- Should skills support arguments beyond the skill name (parameterized prompts)?
- Should
allowed-toolsbe enforced at the runtime level? Currently it is not enforced — it exists in the format but has no runtime effect.
0043 - Component System
- Feature Name: Component System
- Start Date: 2026-02-15
- Discussion: #43
- Crates: command
Summary
Crabtalk components are independent binaries that install as system services and connect to the daemon via auto-discovery. They crash alone, swap without restarts, and the daemon never loads them. This is the manifesto’s composition model made concrete.
Motivation
The manifesto says: “You put what you need on your PATH. They connect as clients. They crash alone. They swap without restarts.”
This requires a system where components — search, gateways, tool servers — are not subprocesses of the daemon. They’re independent programs that run as system services. The daemon discovers them at runtime. A broken component cannot take the daemon down.
Other projects spawn MCP servers as child processes. If the child hangs or crashes, it can take the daemon with it: zombie processes, leaked file descriptors, blocked event loops. The subprocess model creates shared fate. The component model eliminates it.
Design
The contract
A component is a binary that:
- Installs itself as a system service (launchd, systemd, or schtasks).
- Writes a port file to
~/.crabtalk/run/{name}.porton startup. - Serves an HTTP API (MCP protocol) on that port.
The daemon scans ~/.crabtalk/run/*.port at startup and discovers components
automatically. No configuration needed — drop a component on PATH, install it,
and the daemon finds it.
Service trait
#![allow(unused)]
fn main() {
pub trait Service {
fn name(&self) -> &str; // "search"
fn description(&self) -> &str; // human readable
fn label(&self) -> &str; // "ai.crabtalk.search"
}
}
The trait provides default start, stop, and logs methods:
- start — renders a platform-specific service template, installs and launches.
- stop — uninstalls the service and removes the port file.
- logs — tails
~/.crabtalk/logs/{name}.log.
MCP service
Components that expose tools to agents extend McpService:
#![allow(unused)]
fn main() {
pub trait McpService: Service {
fn router(&self) -> axum::Router;
}
}
run_mcp binds a TCP listener on 127.0.0.1:0, writes the port to the
run directory, and serves the router. The daemon discovers it on next scan.
Platform support
Service templates are platform-specific:
- macOS — launchd plist (
~/Library/LaunchAgents/) - Linux — systemd user unit
- Windows — schtasks with XML task definition
Auto-discovery
The daemon scans ~/.crabtalk/run/*.port for port files not already connected.
Each file contains a port number. The daemon connects via
http://127.0.0.1:{port}/mcp. No subprocess management, no shared fate.
Crash? The daemon doesn’t care — it was never the component’s parent process. Restart? New port file, the daemon picks it up on next reload. Update a component? Install the new version, restart the service — the daemon sees the new port on next scan.
Entry point
The run() function handles tracing init and tokio bootstrap for all component
binaries.
Alternatives
Subprocess management. The daemon spawns and manages components as child processes. Rejected because shared fate — a broken child can break the daemon. This is the approach we explicitly designed against.
Docker / containerization. Run components in containers. Rejected because crabtalk is local-first. System services are the right abstraction for a personal daemon on your machine.
Shell scripts for service management. Works on Unix, breaks on Windows, drifts across components. A shared Rust crate is portable and stays consistent.
Unresolved Questions
- Should the Service trait support health checks?
- Should the daemon watch the run directory for new port files instead of scanning only at startup/reload?
0075 - Hook
- Feature Name: Hook Lifecycle
- Start Date: 2026-03-15
- Discussion: #75
- Crates: core, runtime, daemon
Updated by 0162 (Hook-as-plugin) and 0189 (Policy at the Edge). Hooks now own their tools (per-hook
schema()+dispatch()) rather than registering through a sharedToolRegistry.on_before_runwas removed and replaced byon_register_agent/on_unregister_agentfor state tracking.preprocessreturnsOption<String>(None = pass through).
Summary
The Hook trait is the central extensibility point for agent lifecycle. Each subsystem (skills, memory, MCP, scoping, OS tools) implements Hook to provide schemas, dispatch tool calls, contribute system-prompt fragments, observe events, preprocess messages, and track per-agent state. The runtime composes hooks behind a single facade and never reaches into a subsystem directly.
Motivation
When the runtime was split out of the daemon (#75), a clean interface was needed between the runtime (which executes agents) and the hook implementations (which customize them). The runtime must not know about skills, memory, MCP, or daemon infrastructure. It only knows it has a Hook and calls its methods at the right times.
This separation enables two modes: the daemon (full hook with skills, MCP, memory, event broadcasting) and embedded use (no hook, or a minimal one).
Design
The trait
#![allow(unused)]
fn main() {
pub trait Hook: Send + Sync {
fn schema(&self) -> Vec<Tool> { vec![] }
fn system_prompt(&self) -> Option<String> { None }
fn on_build_agent(&self, config: AgentConfig) -> AgentConfig { config }
fn on_register_agent(&self, name: &str, config: &AgentConfig) {}
fn on_unregister_agent(&self, name: &str) {}
fn on_event(&self, agent: &str, conversation_id: u64, event: &AgentEvent) {}
fn preprocess(&self, agent: &str, content: &str) -> Option<String> { None }
fn scoped_tools(&self, config: &AgentConfig) -> (Vec<String>, Option<String>);
fn dispatch<'a>(&'a self, name: &'a str, call: ToolDispatch) -> Option<ToolFuture<'a>> { None }
}
}
All methods have default no-op implementations. () implements Hook.
Lifecycle points
schema — the tools this hook owns. The composite hook unions every sub-hook’s schema() to expose the runtime-wide tool set. There is no shared ToolRegistry — each hook is the source of truth for its tools.
system_prompt — optional fragment appended to agent system prompts at build time. Used by hooks that always inject standing instructions (e.g. memory’s behavioural guidance).
on_build_agent — called when an agent is registered. Receives the agent config, returns a possibly-modified config. The composite implementation chains: environment block (OS, shell, platform), per-hook system_prompt() fragments, resource hints (available MCP servers, available skills), and a <scope> block when the agent restricts its tools/skills.
on_register_agent / on_unregister_agent — called when an agent is added to or removed from the runtime registry. Hooks that track per-agent state (scopes, descriptions, MCP fingerprint refcounts) record and clean up here. Symmetric: by the time Runtime::agent() returns the new agent, hook state is in place; by the time the agent is invisible, hook state has been dropped.
preprocess — called before a user message enters the conversation. Returns Some(modified) to transform, None to pass through. Slash-command resolution (/skill-name args → wrapped <skill> body) lives here.
scoped_tools — given an agent config, returns the subset of this hook’s tools the agent may call, plus an optional <scope> prompt line. Default: include every tool from schema() with no scope line. Hooks override to gate inclusion on AgentConfig fields (e.g. memory only when enabled, skill tool only when the agent has a skills list).
dispatch — called when an agent issues a tool call. Returns Some(future) if this hook owns the tool name, None otherwise. The composite walks hooks in order and dispatches to the first owner.
on_event — called after each agent step. Receives every AgentEvent (text deltas, tool calls, completions). DaemonHook uses this to broadcast events to subscribers.
Composition
DaemonHook is the daemon’s composite hook. It holds a map of named sub-hooks (skill, memory, mcp, os, delegate, ask_user) and orchestrates them: schema() unions, dispatch() walks the registered owners, on_build_agent chains the system-prompt fragments, on_register_agent/on_unregister_agent fan out, on_event broadcasts.
For embedded use, () implements Hook as a full no-op so the runtime works without any subsystems.
Tool dispatch
Dispatch is part of the Hook trait. When an agent produces a tool call, the runtime walks the composite hook and calls dispatch(name, call) until one returns Some(future). Each hook owns the tools it declared via schema(); nothing else can claim them. Scope enforcement happens at the composite layer before walking sub-hooks.
Dispatch returns Result<String, String>. Ok carries normal tool output; Err carries a handler-reported failure (invalid args, not found, scope rejection, operation error) or a dispatch-level failure (no tool sender, tool channel closed, reply dropped). The same convention applies to server-specific tools owned by the daemon (ask_user, delegate). The distinction propagates to the AgentEvent::ToolResult.output field and to the wire protocol’s is_error flag so UIs can render errors distinctly and agents can make retry decisions without string-matching error messages. HistoryEntry::tool still stores the inner string regardless of the arm — the LLM wire format has no is_error field, so the model sees the text either way.
When an agent step produces multiple tool calls, the runtime dispatches them concurrently via FuturesUnordered; tool results are appended to history in the original call order (positional pairing with tool_calls is load-bearing for providers that correlate by index), but the ToolResult events fire in completion order so UIs show fast tools immediately without waiting on slow siblings.
Alternatives
Separate traits per concern. One trait for prompt building, one for tools, one for events. Rejected because they always compose together and the single trait is simpler to implement and reason about.
Closure-based hooks. Pass lambdas instead of a trait. Rejected because the hook needs shared state (skill registry, MCP connections, memory) that closures make awkward.
Unresolved Questions
- Should
on_build_agentbe async to support hooks that need I/O during agent construction? - Should
preprocesssupport returning multiple messages (e.g. for multi-skill invocation)?
0080 - Cron
Summary
Cron triggers skills into agents on a schedule. The scheduler runs as a standalone service outside the daemon and speaks the existing StreamMsg protocol — no cron-specific daemon knowledge, no cron-specific wire messages. The apps/cron crate is desktop-oriented; alternate consumers (e.g. multi-tenant cloud schedulers) model their own entry shape, storage, and time-zone semantics — the shared surface between them is just the daemon’s StreamMsg protocol.
Motivation
Agents need periodic behavior — checking feeds, running maintenance, sending reminders. Time-based triggering is one form of trigger; chat messages, webhooks, and file-watch events are others. All of them produce the same shape: something happens → an agent runs with a payload. Cron is the first concrete implementation of this trigger role and deliberately uses the same StreamMsg path that chat gateways use.
The session already carries the agent and sender. A cron entry needs the skill to fire, the agent to run, the sender to attribute it to, and the schedule expression — nothing else.
Design
Data model
[[cron]]
id = 1
schedule = "0 */2 * * * *"
skill = "check-feeds"
agent = "crab"
sender = "cron"
quiet_start = "23:00"
quiet_end = "07:00"
once = false
id— auto-incremented on create.schedule— standard cron expression, validated on create and load.skill— fired as/{skill}content into the target conversation.agent— agent running the conversation.sender— sender attribution (default"cron").quiet_start/quiet_end— optionalHH:MMwindow in local time. If the fire time falls inside, the tick is skipped silently. No queuing, no catch-up. Both must be set; otherwise quiet hours are ignored.once— fire once then delete.
Architecture
Cron is a separate binary (crabtalk-cron). It runs as a system service managed by crabup / launchd / systemd. The daemon has no cron code — no field, no handlers, no protocol messages.
The binary uses the standard #[command::command] macro — start, stop, run, logs, same as every other service. No admin subcommands; schedule edits are direct file edits.
Persistence
The scheduler reads $CRABTALK_HOME/config/crons.toml. To add, remove, or change a schedule, edit this file in place. The running service polls the mtime every 2 seconds and reconciles timers on change — abort removed schedules, start new ones. Atomic write (tmp + rename) keeps readers consistent.
For once schedules the service deletes the entry after firing — the only mutation the service itself makes.
Firing
On a scheduled tick the service calls ConnectionInfo::stream from the SDK with:
#![allow(unused)]
fn main() {
StreamMsg {
agent: "<from entry>",
content: "/<skill>",
sender: Some("<from entry>"),
..Default::default()
}
}
The reply stream is drained and discarded — output goes to conversation history through the daemon’s normal path. Failures surface as Err items on the receiver; the schedule continues on the next tick.
Alternatives
Keep cron inside the daemon. Rejected. Cron forced a P: Provider bound on the daemon struct for no reason other than that CronStore called runtime.send_to. Embedding also prevented alternate schedulers (e.g. multi-tenant cloud schedulers) from reusing the daemon without running cron.
Introduce an InvokeSkill protocol variant. Rejected. Cron already has everything it needs from SendMsg / StreamMsg — the content /{skill} pattern is what chat-driven slash commands use too. Adding a variant would fragment the trigger contract across consumers and force every downstream scheduler (e.g. external ones) to learn a new wire format.
Cron as a peer protocol endpoint with its own socket for admin. Rejected. The admin surface is thin and routing it through a dedicated socket multiplies client complexity (TUI would need to discover and connect to cron too). Instead, admin is direct file editing — the running service picks up changes via mtime polling.
Put CronEntry + validators in wcore for downstream consumers to reuse. Rejected. Multi-tenant cloud schedulers need different entry fields (tenant id, different trigger payload), different storage (database rows, not TOML), and different time-zone semantics (UTC, not local). The cron::Schedule::from_str “validator” is one line; reimplementing is_quiet for UTC is trivial. Sharing a struct would force alignment on details that shouldn’t align.
Introduce a Trigger trait / crates/trigger library. Premature. Cron is the only trigger today (chat gateways are structurally similar but each has its own domain-specific code — Telegram auth, WeChat sync, etc.). A common trait only becomes clear once a second non-chat trigger lands (webhook, file-watch).
Unresolved Questions
- Should cron support a time-zone override per entry (instead of local time for everything)?
- Should there be a max-concurrent-fires limit so a quiet window ending doesn’t burst?
0082 - Scoping
- Feature Name: Agent Scoping
- Start Date: 2026-03-22
- Discussion: #82
- Crates: runtime, core
Updated by 0193 (Agent-Owned MCP) (2026-04-28).
AgentScope.mcpswas removed: agents now embed their MCP server configurations by value, so MCP scoping is intrinsic to the agent’s declaration and no separate allowlist is needed.
Summary
A whitelist-based scoping system that restricts what an agent can access: tools and skills. Enforced at dispatch time and advertised in the system prompt. This is a security boundary, not a hint. MCP scoping is no longer part of AgentScope — see 0193 for the replacement model.
Delegation is not scoped: crabtalk is a single-user runtime, and any
registered agent can delegate to any other. Multi-tenant identity-based access
control, if ever needed, belongs in a wrapper above the runtime, not inside
AgentConfig.
Motivation
In multi-agent setups, a delegated sub-agent should not have the same capabilities as the primary agent. A research agent doesn’t need bash. Without scoping, every agent has access to everything — which means a misbehaving or confused agent can call tools it was never intended to use.
Scoping solves this by letting agent configs declare exactly what resources are available. The runtime enforces it.
Design
AgentScope
#![allow(unused)]
fn main() {
pub struct AgentScope {
pub tools: Vec<String>, // empty = unrestricted
pub skills: Vec<String>, // empty = all skills
}
}
Empty list means unrestricted. Non-empty means only listed items are allowed. This is an inclusive whitelist, not a denylist. MCPs are not part of AgentScope: AgentConfig.mcps: Vec<McpServerConfig> makes the declaration itself the scope (RFC 0193).
Whitelist computation
When an agent has any scoping (non-empty skills), the runtime computes a tool whitelist during on_build_agent:
- Start with
BASE_TOOLS:bash,ask_user,read,edit— always available. - If memory is enabled: add
recall,remember,memory,forget. - If skills list is non-empty: add
skilltool. - MCP tools the agent declared in
AgentConfig.mcpsare always included — declaration is the gate.
The computed whitelist replaces config.tools. Tools not on the list are invisible to the agent. The delegate tool is always available — delegation is not gated by scope.
Prompt injection
A <scope> block is appended to the system prompt listing the agent’s allowed
resources:
<scope>
skills: check-feeds, summarize
</scope>
This tells the agent what it can use. The agent doesn’t need to guess or discover — its boundaries are stated upfront. MCP servers are listed separately in the resource-hints block from the agent’s own mcps declaration.
Enforcement
Scoping is enforced at two dispatch points:
- Tool dispatch — rejects tool calls not in the agent’s tool whitelist.
- Skill dispatch — rejects skill names not in the agent’s skill list.
MCP dispatch needs no explicit gate: the agent only sees the MCPs it declared, so calls outside that set are structurally impossible.
Enforcement happens at runtime, not just at prompt time. Even if the LLM
ignores the <scope> block and tries to call a restricted tool, the dispatch
layer rejects it.
Sender restrictions
Not all base tools are available to all senders. bash is blocked for
non-CLI senders (gateway agents from Telegram, WeChat, etc.) because it
grants arbitrary shell access. read and edit have no sender
restriction — they are read-only or scoped mutations that are safe for
gateway agents. See #67.
Delegate CWD isolation
When delegating parallel tasks, the orchestrating agent can assign each
sub-agent a separate working directory via the cwd field on DelegateTask.
Tools resolve relative paths against the conversation CWD, so isolated CWDs
prevent concurrent sub-agents from trampling each other’s files. The edit
tool’s unique-match requirement provides a second layer: if another agent
changed the file between read and edit, old_string won’t match and the
edit fails — optimistic concurrency without locks.
Default agent
The default agent (primary) has no scope restrictions — empty lists on all three dimensions. Scoping is for sub-agents that need constrained access.
Alternatives
Denylist instead of whitelist. List what’s forbidden instead of what’s allowed. Rejected because allowlists are safer by default — a new tool or server is inaccessible until explicitly granted. Denylists require updating every time a new resource is added.
Prompt-only scoping. Tell the agent its restrictions in the prompt but don’t enforce at dispatch. Rejected because LLMs don’t reliably follow instructions — a determined or confused model will call tools it was told not to. Enforcement must be at the dispatch layer.
Unresolved Questions
- Should scoping support wildcard patterns (e.g.
mcp: search-*)? - Should scope violations be logged as security events for monitoring?
0121 - Event Bus
- Feature Name: Unified Event Bus
- Start Date: 2026-04-04
- Discussion: #121
- Crates: daemon, core, runtime
- Updates: 0080 (Cron)
Summary
A daemon-level event bus that routes named events to target agents via exact-match subscriptions. Agent completion is the first built-in event source. The bus also enables non-blocking delegation and ad-hoc worker agents.
Motivation
The daemon can trigger agents on a schedule (cron) and run agents on user request (protocol). But there’s no way for one agent’s completion to trigger another agent. The Signal pipeline (crabtalk/app#59) needs exactly this:
RSS fetch → Scout classifies → Crab enriches → client notification
Each stage produces a result that the next stage consumes. Without an event system, this requires the client to orchestrate the chain — polling, waiting, re-sending. The daemon should own this.
Separately, delegate blocks the parent agent until all tasks complete. For
background research or parallel work, this is a limitation. If the daemon can
route agent completion events, non-blocking delegation falls out for free.
Design
Event bus
An in-memory subscription table that matches events by exact source string and fires target agents with the event payload as message content.
# events.toml
[[subscription]]
id = 1
source = "agent:scout:done"
target_agent = "crab"
once = false
Follows the CronStore pattern: HashMap-backed, TOML-persisted, auto-incrementing IDs, atomic writes (tmp + rename). Survives runtime reloads.
Event sources
Events are namespaced strings. Two source types exist today:
| Source | Example | Emitter |
|---|---|---|
| Agent completion | agent:scout:done | Daemon, via on_agent_event hook |
| External | rss:fetch, signal:classified | Client or adapter, via PublishEvent |
Agent completion events are emitted automatically when a conversation stream ends. The payload is the agent’s final text response.
External events are published via the PublishEvent protocol message — any
client, adapter, or webhook handler can fire events into the bus.
Routing
Event arrives (via DaemonEvent::PublishEvent)
→ event loop calls EventBus::publish() inline (no spawn)
→ exact match source against subscription table
→ for each match: fire target agent via SendMsg (fire-and-forget)
→ if once: remove subscription, persist
Events always start new work. There is no injection into active conversations — that’s a separate concern (#117).
Fired agents receive the payload as message content with sender
"event:{source}". This follows the established convention
("delegate:{id}", "cron") for non-user senders.
Protocol
Four new operations on the Server trait:
message SubscribeEventMsg {
string source = 1;
string target_agent = 2;
bool once = 3;
}
message UnsubscribeEventMsg { uint64 id = 1; }
message ListSubscriptionsMsg {}
message PublishEventMsg { string source = 1; string payload = 2; }
Responses: SubscriptionInfo for subscribe, Pong for unsubscribe/publish,
SubscriptionList for list.
DaemonEvent::PublishEvent
All publish paths route through a single DaemonEvent::PublishEvent variant
in the central event loop. This avoids lock-ordering issues — the event bus
mutex is only acquired inside the sequential event loop, never from the
protocol handler or hook callbacks directly.
#![allow(unused)]
fn main() {
DaemonEvent::PublishEvent { source, payload } => {
self.events.lock().await.publish(&source, &payload);
}
}
Non-blocking delegation
The delegate tool gains a background: bool field. When true:
- Tasks are spawned via the existing
spawn_agent_taskmechanism dispatch_delegatereturns immediately with task IDs- The parent agent continues working
- When each task completes, the daemon emits
agent:{name}:done - Event bus routes the completion to any matching subscriptions
No new mechanism — just the existing spawn infrastructure plus the event bus.
Worker pseudo-agent
A built-in worker agent registered at daemon startup alongside crab.
Always available as a delegate target without pre-configuration:
- Inherits the system agent’s thinking setting
- Gets the full tool registry (no explicit filter)
- Ephemeral — sessions are killed after task completion (existing behavior)
- Always a valid delegate target (delegation is not scoped)
This eliminates the friction of configuring named agents for ad-hoc tasks like “read these files and summarize” or “search for X in the codebase.”
What this is NOT
- Not a message broker. No durability, no exactly-once delivery, no dead letter queues. Fire-and-forget with best-effort delivery.
- Not an orchestration DAG. No conditional routing, no fan-out/fan-in. Agents subscribe to events — that’s it.
- Not a replacement for
delegate. Delegation is synchronous and returns results inline. Events are asynchronous and deliver results out-of-band.background: truebridges the two.
Updates
0080 - Cron
The cron system continues to work as-is. Cron entries fire skills via the
daemon event channel — this is unchanged. A future iteration may refactor cron
as an event source adapter, emitting cron:{id}:fired events into the bus, but
this is not in scope. The event bus is additive, not a cron replacement.
Alternatives
Agent completion triggers (no bus). A simpler design where completion of agent X directly triggers agent Y, without a general subscription mechanism. Rejected because the Signal pipeline needs external events (RSS fetch results) alongside agent completions — a bus handles both uniformly.
Glob matching on source patterns. The RFC originally proposed wildcard
subscriptions like "agent:*:done". Rejected for v1 — exact match covers all
current use cases. Glob matching can be added when a real consumer needs it.
Template interpolation. The RFC originally proposed {{payload}}
interpolation in a prompt_template field. Rejected — agents are the template
engine. The payload goes in as-is; the agent’s instructions handle
interpretation.
Unresolved Questions
- Should there be a max subscription count?
- Should the bus detect infinite loops (agent A triggers B triggers A)? Currently fire-and-forget prevents stack overflow but allows unbounded chains of spawned tasks.
0135 - Agent-First Protocol
- Feature Name: Agent-First Protocol
- Start Date: 2026-04-03
- Discussion: #135
- Crates: core, runtime, daemon, cli, gateway
- Supersedes: 0064 (Session), 0078 (Compact Session)
- Updates: 0018 (Protocol), 0038 (Memory)
Summary
Replace session-centric protocol addressing with agent-centric addressing. Users talk to agents, not sessions. Introduce guest turns for multi-agent conversations and compaction archives as the agent’s long-term memory.
Motivation
The original protocol was session-centric: clients managed session IDs to kill, reply, compact, and route messages. This leaked an implementation detail (the session ID) into every client and forced multi-agent interaction into either permanent agent switching or invisible delegation.
Problems with the session model:
-
Session IDs leak everywhere. Every client (CLI, Telegram, WeChat, IDE) must track session IDs to route replies, kill conversations, and handle ask_user prompts. If a client loses the ID, the conversation is orphaned.
-
Multi-agent is invisible. When agent A delegates to agent B, the result comes back as a tool result string. The user hears A’s summary of B’s answer, never B’s actual voice. There’s no multi-agent conversation.
-
Session ≠ conversation. “Session” conflated device connections (CWD, transport state) with agent memory (message history, compaction). These are different lifecycles — connections are ephemeral, conversations persist.
Design
Core model
Each agent has one continuous conversation per user. Conversations are
keyed by (agent, sender) — no session IDs in the protocol.
Client: StreamMsg { agent: "crab", content: "hello", sender: "user" }
Daemon: resolves (crab, user) → internal conversation, runs agent, streams response
Conversation vs session
| Session | Conversation | |
|---|---|---|
| What | Device ↔ daemon connection | Agent’s memory with a user |
| Key | connection/device ID | (agent, sender) |
| Lifetime | ephemeral | persistent |
| State | CWD, transport | messages, title, JSONL, archives |
Sessions are daemon-internal. Conversations are the protocol-visible abstraction.
Protocol changes
Client messages address conversations by (agent, sender):
message StreamMsg {
string agent = 1;
string content = 2;
optional string sender = 4;
optional string cwd = 5;
optional string guest = 6; // guest turn
}
message KillMsg {
string agent = 1;
string sender = 2;
}
message ReplyToAsk {
string agent = 1;
string sender = 2;
string content = 3;
}
message CompactMsg {
string agent = 1;
string sender = 2;
}
Removed from the protocol: session (u64 ID), new_chat, resume_file.
Server responses no longer include session IDs:
message StreamStart {
string agent = 1; // no session field
}
Guest turns
The guest field on StreamMsg enables multi-agent conversations. When set,
the daemon runs the guest agent against the primary agent’s conversation
history — text-only, no tool dispatch.
Flow:
- Client sends
StreamMsg { agent: "twin", content: "question", guest: "crab" } - Daemon finds twin’s conversation
- Adds user message to twin’s history
- Injects guest framing (auto-injected system message)
- Runs crab against twin’s history with crab’s system prompt (no tools)
- Tags response with
agent: "crab" - Appends to twin’s history
The guest’s response appears as a first-class message in the conversation, attributed to the guest. No delegation, no tool results, no paraphrasing.
Bidirectional framing
Both guest and primary need context about multi-agent conversation:
- Guest framing (injected when a guest runs): “You are joining a
conversation as a guest. Messages wrapped in
<from agent="...">tags are from other agents.” - Primary framing (injected when the primary runs and guest messages exist
in history): “Messages wrapped in
<from agent="...">tags are from guest agents. Continue responding as yourself.”
Both are auto_injected — stripped before each run, re-injected fresh. Zero
accumulation.
Message attribution
The Message struct gains an agent field:
#![allow(unused)]
fn main() {
#[serde(default, skip_serializing_if = "String::is_empty")]
pub agent: String,
}
Empty = the conversation’s primary agent. Non-empty = a guest. When building
LLM requests, assistant messages with non-empty agent are prefixed with
<from agent="..."> XML tags so every agent can distinguish speakers.
Message::with_agent_tag() handles the prefixing — one function, used by
both build_request and guest_stream_to.
Compaction as memory
Compaction markers become archive boundaries. Each compact marker stores a title (first sentence of the summary, max 60 chars) and a timestamp:
{"compact":"Summary of pricing discussion...","title":"Pricing analysis for solo dev tools.","archived_at":"2026-04-03T10:00:00Z"}
The conversation is continuous — compaction doesn’t create a new conversation,
it archives a segment of the existing one. Archived segments are browsable
via Conversation::load_archives() and available to the recall tool as
long-term memory.
Crab's memory:
├── [active] Current conversation
├── "Pricing analysis for solo dev tools." — 2 days ago
├── "Auth module refactor plan." — 5 days ago
└── "HN competitor signal analysis." — last week
What dies
- Session IDs in the protocol — replaced by (agent, sender)
new_chat— the conversation is continuous, compaction handles the windowresume_file— one conversation per (agent, user), always active- Client-side @mention logic (0078) — guest turns handle it daemon-side
- Session forking — agents are the abstraction, not sessions
Supersedes
0064 - Session
The session model is replaced by conversations. The JSONL file format is
preserved (backward compatible with added title and archived_at fields on
compact markers, and agent field on messages). The Session struct is renamed
to Conversation. Session IDs are removed from the protocol.
0078 - Compact Session
The compact-then-handoff pattern for @mentions is replaced by guest turns. The daemon handles multi-agent conversation natively — no client-side compact logic needed.
Updates
0018 - Protocol
Session-addressed messages are replaced with (agent, sender) addressing.
StreamMsg and SendMsg gain a guest field. SessionInfo becomes
ActiveConversationInfo. See protocol changes section above.
0038 - Memory
Compaction archives become the primary long-term memory mechanism. The recall tool searches across archived segments. See #101 (revised) for the pluggable memory provider aligned with this model.
0150 - Memory Store
- Feature Name: Memory Store
- Start Date: 2026-04-14
- Discussion: #38
- Crates: memory, crabtalk, runtime
- Supersedes: 0038 (Memory)
Updated by 0189 (2026-04-28). Auto-recall (
Memory::before_run) was removed;recallis now strictly model-driven. See 0189 for the rationale.
Summary
A standalone crabtalk-memory crate backing agent memory with a single binary db file, atomic persistence, and BM25 recall. The markdown tree is a human-facing export — not the primary store. Entries come in two kinds: Note (agent-written via remember/forget) and Archive (compaction output). The agent’s system prompt is human-managed via Crab.md (existing layered-instructions mechanism) — the memory store has no opinion on it.
Motivation
RFC 0038 bet on file-per-entry markdown as the primary store. In practice that premise did not hold:
- Atomic writes don’t compose across many files. Every remember/forget touched an entry file plus a sidecar index; a crash mid-op left the tree inconsistent. A single-file db is atomic by rename+fsync.
- Compaction archives need a store. Agent-First (0135) made compaction archives first-class long-term memory. Archives share recall and lifecycle with notes, but aren’t user-editable text — they’re generated output. A kind-typed entry in the db is the right home.
- Aliases improve recall. Humans reach for an entry under several names (“release” / “ship” / “deploy”). BM25 needs them as indexable terms, which frontmatter had no slot for.
- Dump/load still matters for humans. Users want to read and edit memory with a text editor or mdbook. That’s solved by exporting the db as a markdown tree on demand, not by making the tree the source of truth.
A separate observation that shaped the API surface:
- The system prompt is not memory. 0038 carried a
MEMORY.mdcurated overview that the agent could rewrite via a dedicatedmemorytool. That conflated two different things: persistent recall (the agent’s notes) and instructions (the human’s prompt). It also gave the agent a footgun — overwriting the whole thing in a single tool call with no diff. Killed: thememorytool, thePromptentry kind, and the reservedglobalname. The system prompt now lives inCrab.md(already a file, already layered, already human-edited). If a human wants the agent to edit it, they grant that in prose insideCrab.mdand the agent uses the standard file-edit tools.
Design
Crate layout
crabtalk-memory is a standalone crate. The crabtalk hooks own one Memory handle and expose a SharedStore = Arc<RwLock<Memory>> to the runtime so compaction can write archives and session resume can read them.
Binary file format (CRMEM v1)
All integers are little-endian. Strings are UTF-8, length-prefixed by a u32 byte count (no NUL terminator). The whole file is one contiguous blob — no sections, no index, no padding.
Header — 16 bytes:
offset size field value
------ ---- --------- -------------------------------------------------
0 6 magic "CRMEM\0"
6 4 version u32 (= 1)
10 2 flags u16 (= 0; unknown bits rejected on read)
12 4 reserved [u8; 4] (= 0)
Body:
size field notes
---- ----------- -----------------------------------------------------
8 next_id u64 monotonic EntryId allocator; persisted so
IDs stay stable across open/close
4 entry_count u32
* entries entry_count repetitions of the per-entry record
Per entry:
size field notes
---- ----------- -----------------------------------------------------
8 id u64
8 created_at u64 unix seconds
4 kind u32 0 = Note, 1 = Archive
4 name_len u32
* name utf8 bytes, name_len long
4 content_len u32
* content utf8 bytes, content_len long
4 alias_count u32
* aliases alias_count repetitions of { u32 len + utf8 bytes }
kind is u32 rather than u8 so the fixed entry prefix stays 4-byte aligned — cheap hygiene for any future on-disk index work. The inverted BM25 index is not persisted; it’s rebuilt from entries on load. Keeps the file small and the format boring.
Reader invariants: magic mismatch, wrong version, non-zero flags, truncated body, invalid UTF-8, or an unknown kind value all fail the open with BadFormat. A missing file opens an empty db (the file is created on the first successful write).
Persistence
Every apply(Op) mutates RAM then flushes atomically. The flush sequence is:
- Encode the entire db to an in-memory
Vec<u8>. create_dir_all(parent)if needed.- Write to a sibling temp file
{name}.tmpandfsyncit. rename(tmp, path)— atomic on POSIX when on the same filesystem.fsyncthe parent directory so the rename itself is durable.
A flush failure leaves RAM ahead of disk until the next successful op or the next open (which re-reads the file). WAL closes that window in v2. Memory::checkpoint() forces the same flush without a mutation.
Entry model
#![allow(unused)]
fn main() {
enum EntryKind { Note, Archive }
struct Entry {
id: u64,
name: String,
content: String,
aliases: Vec<String>,
created_at: u64,
kind: EntryKind,
}
}
- Note — remember/forget entries.
- Archive — compaction output. Written by the runtime during compaction, surfaced by
recallas long-term memory (per 0135).
Kind is immutable per entry: Update rewrites content and aliases but keeps kind; use Remove + Add to change it.
Write ops
Writes go through an Op enum:
#![allow(unused)]
fn main() {
enum Op {
Add { name, content, aliases, kind },
Update { name, content, aliases },
Alias { name, aliases },
Remove { name },
}
}
Memory::apply(op) mutates + flushes. Callers never touch fs::write directly.
Recall
BM25 with Lucene-style IDF (ln((n - df + 0.5)/(df + 0.5) + 1.0)), k1=1.2, b=0.75. The index is an inverted index of tokens from entry content and aliases, keyed by EntryId. Search walks the posting lists for query terms instead of rescanning every entry on every query.
Recall is model-driven
There is no auto-recall. RFC 0189 removed the per-turn injection: the runtime never silently searches memory or prepends <recall> blocks. Recall happens only when the model calls the recall tool itself, or when a client explicitly searches memory before sending a user message. The Memory::before_run helper is gone; MemoryHook no longer participates in on_before_run.
System prompt
The hook contributes one <system_prompt> fragment: the contents of prompts/memory.md, which tells the agent when to use the memory tools (tool signatures come from each input struct’s /// doc comment via schemars). The agent’s identity / behavior prompt is not the memory store’s responsibility — it’s Crab.md, layered from <config_dir>/Crab.md and any project-local Crab.md walked up from CWD (see daemon::host::discover_instructions).
Tools
Three tools exposed to the agent:
remember(name, content, aliases)— upsert aNote.forget(name)— delete aNote.recall(query, limit)— BM25 search, returns formatted results.
There is no memory tool. Editing the agent’s system prompt is a human action against Crab.md. If the human wants to delegate that authority, they say so in Crab.md and the agent uses the standard file-edit tools — no special-case tool, no reserved entry name, no parallel write path.
Dump / load
Memory::dump(dir) writes the db as an mdbook-ready tree for humans:
brain/
book.toml ← seeded on first dump; user edits survive re-dumps
SUMMARY.md ← mdbook ToC (ignored on load)
notes/{name}.md
archives/{name}.md
The seeded book.toml sets src = "." so mdbook serve brain/ works against the tree as-is — no shuffling into an src/ subdirectory. It’s only written when absent; any customizations survive later dumps.
Each entry file starts with an HTML metadata block, followed by pure markdown content:
<div id="meta">
<dl>
<dt>Created</dt>
<dd><time datetime="2026-04-14T10:23:45Z">2026-04-14T10:23:45Z</time></dd>
<dt>Aliases</dt>
<dd><ul><li>ship</li><li>release</li></ul></dd>
</dl>
</div>
prod rollout steps ...
Chosen for mdbook: <dl> / <dt> / <dd> is the semantic HTML for key-value metadata, renders as a labeled info card, and doesn’t pollute mdbook’s heading tree. <time datetime="..."> round-trips the exact unix timestamp. A file that doesn’t start with <div id="meta"> is treated as pure content with no metadata.
Memory::load(dir) reads the tree and replaces the db. It validates fully before mutating — a mid-load error leaves the current state untouched. Each kind’s subdirectory is cleared on dump so renames and deletes don’t leave orphan files behind; anything else in dir (e.g. a customized book.toml, a theme/ directory) is left alone.
Alternatives
Stay with file-per-entry (0038). Rejected — compaction archives need a real store, and atomic multi-file writes would require WAL anyway. A single file gets atomicity for free.
SQLite. Overkill for 10²–10³ entries, adds a dependency and schema migrations. A 200-line hand-rolled format is simpler and easier to inspect with xxd.
Embedding-based search. Still rejected for the same reasons as 0038: requires a vector store and embedding model. BM25 is fast, dependency-free, and works well at the entry sizes agents produce.
Unresolved Questions
- WAL for crash safety in the window between the RAM mutation and the atomic flush.
- Should
load()merge instead of replace? - Should archives expire or be garbage-collected past some age / count?
0184 - crabup
- Feature Name: crabup
- Start Date: 2026-04-24
- Discussion: #184
- Crates: new
crabupbinary; consumescommand; shrinkscrabtalkd - Updates: 0043 (Component System)
Summary
crabup is a thin wrapper over cargo install that also owns launchd/systemd/schtasks lifecycle for every crabtalk binary. crabup install crabtalkd spawns cargo install crabtalkd. The value add is service management — the one thing cargo install doesn’t do — not distribution, not version coordination, not a registry.
Motivation
Two real problems today, both about service management, not about distribution:
- The daemon is its own installer.
crabtalkd startgenerates and loads a platform unit for itself via thecommandcrate; every other binary (crabtalk-telegram,crabtalk-wechat, …) does the same thing with the same code. A daemon shouldn’t install itself, and the install path shouldn’t live in three places. - No one-stop service surface.
ps,logs,start,stopare duplicated per binary and absent for most. Users need a single tool that knows about all crabtalk services on the machine, not one subcommand per binary.
Distribution is already handled: every crabtalk crate publishes to crates.io with version.workspace = true, so cargo install crabtalkd is the install story today. It will remain the install story under crabup — crabup just renames the command and wraps service management around it.
RFC 0043 defined how components talk to the daemon (port-file discovery, MCP contract). This RFC defines how they get installed and stay alive.
Design
Command surface
crabup pull <name> [--version X] # cargo install crabtalk-<name> (or crabtalkd)
crabup rm <name> # cargo uninstall
crabup update # bump every installed crabtalk-* crate to latest
crabup list # installed crabtalk-* crates
crabup ps # all crabtalk services, one view
crabup <name> start # install + load platform unit
crabup <name> stop
crabup <name> restart
crabup <name> logs [-f]
<name> is a short name from the resolution table below, so crabup daemon start, crabup telegram start, crabup search logs -f. Each short name is both a pull/rm target and a service-command namespace. pull/rm install and remove crabtalk binaries via cargo install; pkg add/pkg remove install and remove crabtalk packages (manifests + cached source repos), so the user has one tool for both install surfaces.
crabup update is always batch — it bumps every installed crabtalk-* crate to the latest version on crates.io, same shape as rustup update over its components. There is no per-component update verb: if you only want to change one crate, that’s crabup pull <name> --version <X>. This makes “keep the set aligned” the default behavior of the only tool users will reach for when they want newer bits, without needing atomic-set machinery to enforce it.
That’s it. No pin, no doctor, no component add vs pull split — cargo install already handles versions; a component is just a crate you can run as a service. No atomic-set enforcement; if a user mixes versions and breaks the wire, the fix is crabup pull <name> --version <matching> for the mismatched one or crabup update to bump everything.
pull is a pass-through
crabup pull <name>
↓ resolve name → crate ("tui" → "crabtalk-tui"; "daemon" → "crabtalkd")
↓ cargo install <crate> [--version X]
Name resolution is a small table compiled into crabup:
| Short name | crates.io crate | Role |
|---|---|---|
daemon | crabtalkd | daemon |
tui | crabtalk-tui | REPL client |
telegram | crabtalk-telegram | Telegram gateway |
wechat | crabtalk-wechat | WeChat gateway |
search | crabtalk-search | meta-search service |
cron | crabtalk-cron | scheduler |
crabup pull <short> resolves via the table; crabup pull <anything-else> passes through verbatim so crabup pull some-third-party-crabtalk-gateway still works without a table edit. New first-party binaries get a row added when they ship.
Binaries land in ~/.cargo/bin, where cargo install has always put them. crabup list reads ~/.cargo/.crates.toml and filters for crabtalk*. There is no parallel state file; if .crates.toml is wrong, cargo is wrong, and crabup being wrong with it is the correct behavior.
Prerequisite: cargo on PATH. If missing, crabup prints one line pointing at https://rustup.rs and exits. No auto-install, no curl-pipe — the daemon doing that was part of what motivated this RFC.
Service management (the real content)
The command crate already renders launchd.plist, systemd.service, and schtasks.xml and exposes install/uninstall/log-tail helpers. It stays. What changes is the caller: today each binary calls command::install from its own CLI; after this RFC only crabup calls into command. crabtalkd start/stop/ps/logs are deleted; so are the mirrored flags in crabtalk-tui.
crabup <name> start is:
- Find the binary on
PATH(fail fast if not installed). - Look up service metadata in crabup’s name table — the same table that resolves short names to crates also carries
label(mechanical:ai.crabtalk.<name>) anddescription. crabup is the package manager; it owns this metadata, the binaries don’t need to expose it. - Render the platform unit via
commandand load it.
crabup ps is the one piece that needs more than wrapping: it scans ~/.crabtalk/run/*.port (the same directory RFC 0043 already defines) and checks each listener, then cross-references with whatever the platform’s service manager reports for ai.crabtalk.* labels. One view, all services.
Component model
RFC 0043 stands unchanged. A component is a binary that writes a port file on startup and serves MCP on that port. crabup doesn’t alter the contract — it just installs and service-manages those binaries the same way it does crabtalkd. “Install a component” and “install the daemon” are the same operation under different names.
crabllm as a managed service (optional, motivated)
Today crabllm-provider is a library linked into crabtalkd. Making crabllm a separate service is worth doing only if at least one of these is concrete:
- One set of provider credentials serves multiple daemons on the same machine.
- Central place for provider fallback, rate-limit smoothing, or caching.
- Swap models or provider SDKs without restarting
crabtalkd.
None of those are pressing yet. When one is, crabtalk-llmd becomes another crate crabup installs and service-manages, same as any gateway. The RFC doesn’t need to anticipate it.
Impact on crabtalkd
Removed from crabtalkd | Replaced by |
|---|---|
Command::Start { force } | crabup daemon start (first install: crabup pull daemon) |
Command::Stop, Restart | crabup daemon stop / crabup daemon restart |
Command::Ps | crabup ps (all services, one place) |
Command::Logs | crabup daemon logs |
ensure_config + attach::setup_llm on first start | crabup daemon start first-run flow |
Duplicate forwarding in TUI (--start, --stop) | Removed |
After this, crabtalkd’s CLI is run (the long-running process the service unit invokes, equivalent to today’s --foreground), reload, and events. Package install/uninstall live in crabup as pkg add/pkg remove, not in the daemon CLI.
Alternatives
Plain cargo install, no crabup. Installs are one command, but users hand-write launchd/systemd units per binary, and ps/logs across services don’t exist. The service-management gap is the whole reason crabup is a separate tool.
A real package manager with its own manifest, signed pre-built binaries, version coordination, atomic-set installs. Previously drafted; cut. Infrastructure we don’t need — crates.io is the registry, workspace-version inheritance is the coordination, and the non-developer audience that would need pre-built binaries doesn’t exist yet. If that audience materializes, pre-built becomes a second crabup pull backend alongside the cargo install path.
Keep each binary’s start/stop/logs subcommand, just delete the cross-binary dispatcher. Leaves three copies of the same install code and no one-stop service view. Cuts nothing meaningful.
Dynamic plugin loading (shared objects). Rejected by RFC 0043 — shared fate with the daemon is the exact thing the component model avoids.
Unresolved Questions
- Windows service layer.
schtasksis weaker thanlaunchd/systemd(no restart-on-failure, limited log routing). Acceptable for v1, or not? rmscope. Shouldcrabup rm daemonalso remove~/.crabtalk/config/? Leaning no (rmis binary-only; data stays); confirm.- Multiple daemon instances. If two
crabtalkdinstances run on one machine, what owns~/.crabtalk/? Out of scope for v1.
0185 - Session Search and Storage Primitives
- Feature Name: Session Search and Storage Primitives
- Start Date: 2026-04-25
- Discussion: #185
- Crates: core, memory, runtime, crabtalk
- Supersedes: 0171 (Topic Switching)
- Updates: 0135 (Agent-First), 0150 (Memory Store)
Updated by 0189 (2026-04-28). The “automatic compaction on overflow as a safety net” carve-out and auto-title generation were both removed; clients drive both via
compact_conversationand a futuregenerate_titleRPC, gated on the newAgentEvent::ContextUsageevents. See 0189 for the rationale.
Summary
Collapse the topic subsystem. Sessions persist unconditionally and carry a small runtime-managed meta blob. Recall gains a second BM25 index — this one over conversation messages — returning windowed excerpts with bounded size. The runtime exposes narrow session primitives and two search tools; client UX owns /clear, /new, /compact, titling, and session routing. The “topic” concept dissolves: content-derived session search (BM25) replaces tag-based grouping, and any curated grouping that survives is a client concern.
Motivation
RFC 0171 introduced topic switching to partition a single (agent, sender) pair into N parallel threads keyed by title, with tmp chats that skip persistence until the agent “promotes” them by entering a topic. In practice it conflated four independent concerns into one knot — routing (“which conversation does this message land in?”), persistence policy (“should this session hit storage?”), recall indexing (“how do we find related past work?”), and lifecycle UX (“when does a chat end and a new one begin?”). Each wanted a different home, and riding one mechanism for all of them produced the TopicRouter reservation/rollback dance, the tmp/promote split, and agent-upfront title commitment on what should have been retrospective categorization.
The reframe driving this RFC: a topic is not a thing. It was a name trying to be a routing key, a memory kind, a session tag, and a recall index simultaneously. With BM25 over session messages, content-derived recall eats the tag’s lunch — the agent searches “cron refactor” and gets back the conversations that actually discussed it, without any of them ever being classified upfront. What remains worth keeping is a summary field that boosts search ranking when one happens to exist (piggybacking on work the runtime already does during overflow compaction).
Design
Layering: runtime vs. client
The runtime’s job is to provide mechanical primitives. UX and policy decisions — when to clear, when to compact, when to title, when to recall, which session to route a message to, how to surface archival browsing — belong one layer up in the client. RFC 0189 finished the move: the runtime no longer auto-compacts on overflow, no longer spawns title generation, no longer auto-recalls. Clients drive compact_conversation, future generate_title, and explicit memory search themselves, gated on AgentEvent::ContextUsage events.
Runtime primitives (policy-free):
new_session(agent, sender) -> id— always creates, always persists. No tmp, no deferred-persistence gate.append_message(id, msg)— writes to storage and incrementally updates the session BM25 index.list_sessions(filters?) -> [SessionSummary]— meta rows only, paginated.list_messages(id, offset, limit) -> [Message]— paginated browse for when a caller wants to walk a session linearly.get_session_meta(id) -> ConversationMeta— cheap lookup of current meta snapshot.
Search tools (agent-facing):
search_memory(query) -> [Entry]— unchanged. BM25 over memory entries; returns whole entries because entries are small.search_sessions(query, context_before=4, context_after=4, filters?) -> [SessionHit]— new. BM25 over message text; returns bounded windowed excerpts.
Auto-behaviors: none. Both auto-titling and overflow compaction were removed by RFC 0189. The summary field on ConversationMeta is still populated when a client triggers compact_conversation, and session search still boosts on it; the runtime just doesn’t initiate either step on its own.
Client-owned (explicit non-goals for the runtime):
/clear,/new,/compact, “resume session by title”, session picker UX — composed from the primitives above.- Saved searches, archival browsing, “wiki view” — pure presentation.
- Routing decisions — the client tells the runtime which
session_idto append to; the runtime does not infer this from topic state.
ConversationMeta
The target shape, replacing the current struct in crates/core/src/storage.rs:
#![allow(unused)]
fn main() {
pub struct ConversationMeta {
pub agent: String, // immutable, set at creation
pub created_by: String, // immutable, set at creation
pub created_at: String, // immutable, set at creation
pub title: String, // empty until a client sets one (no auto-title; no wire RPC yet)
pub updated_at: String, // bumped on every append_message
pub message_count: u64, // bumped on every append_message
pub summary: Option<String>, // populated when a client calls compact_conversation
}
}
Removed: topic (subsumed by session search), uptime_secs (replaced by updated_at; uptime is derivable if a caller still needs it).
Writers:
| Field | Writer | When |
|---|---|---|
agent, created_by, created_at | runtime | session creation |
title | — | empty by default; client-driven titling is a follow-up (no wire RPC yet) |
updated_at, message_count | runtime | every append_message |
summary | runtime | when a client triggers compact_conversation |
Meta is not an agent-writable blob. The runtime owns every field. If a later RFC needs an agent-curated field (e.g., session-to-entry back-links to optimize resume hydration), it lands as a separate proposal with a measured recall-failure case justifying the code cost — not speculatively in this one.
Schema migration
Zero-touch upgrade. All meta fields added by this RFC use #[serde(default)]; removed fields (topic, uptime_secs) are silently ignored on deserialize. On the next meta rewrite for a given session (any append_message triggers one), the removed fields are dropped from disk. No migration pass, no version bump, no operator intervention. Old session JSONL files mix cleanly with new writes.
Serde config on ConversationMeta:
#[serde(default)]onupdated_at,message_count,summary.#[serde(default, skip_serializing)]on the removed fields during the transition window if aDeserializederive would otherwise reject unknown keys — standard#[serde(default)]struct-level behavior covers this without explicitskip.- No
deny_unknown_fieldsanywhere on this struct.
Session search — BM25 over messages
The memory crate already ships a 157-line hand-rolled inverted BM25 index (crates/memory/src/bm25.rs, zero external deps). Session search reuses this primitive. Two choices, to be decided during implementation: (a) lift bm25::Index into a shared module used by both the memory crate and a new session index, or (b) instantiate a parallel index owned by the runtime. Either way, no new workspace deps.
Field weights, inherited from the community Claude Code conversation-search pattern (alexop.dev, raine/claude-history):
summary— 3.0× (when present; skipped when absent)title— 2.0×- user messages — 1.5×
- assistant messages — 1.0×
- tool-use turns — 1.3× (proxy for “a solution was applied”)
Hit shape with explicit bounds. Messages can contain large tool results, blobs, or attachments. Returning raw Message objects in search windows would defeat the bounding the windowing was meant to provide. The hit type projects to a fixed small shape, not full messages:
#![allow(unused)]
fn main() {
pub struct SessionHit {
pub session_id: u64,
pub msg_idx: usize,
pub score: f64,
pub meta: SessionSummary, // title, created_at, updated_at, message_count
pub window: Vec<WindowItem>, // context_before + match + context_after
}
pub struct WindowItem {
pub role: Role,
pub msg_idx: usize,
pub snippet: String, // truncated to MAX_SNIPPET_BYTES
pub truncated: bool,
pub tool_name: Option<String>, // for tool-use turns
}
}
Hard limits:
MAX_SNIPPET_BYTES = 1024per window item.MAX_WINDOW_ITEMS = context_before + 1 + context_after, capped at 16 regardless of caller request.MAX_HITS_PER_QUERY = 20.
A full-message read always goes through list_messages(session_id, offset, limit) — there is no “load entire session” primitive, by design.
Performance budget and cold-start
Concrete targets this RFC commits to:
search_sessionsquery latency: p99 ≤ 50ms at 100k indexed messages; p99 ≤ 200ms at 1M. CPU-only — the index is in memory.append_messageindexing overhead: ≤ 1ms added per append at any index size up to 1M messages. Pure CPU.- Cold-start index rebuild: dominated by storage I/O, not BM25. The CPU portion is sub-second at 100k messages, but a real
FsStoragerebuild does oneload_sessionper persisted session — at 100k messages spread across 2k sessions, end-to-end rebuild is on the order of 10–20 seconds on local SSD. Rebuild runs in the background after daemon startup; live appends index immediately, so new work is always findable. Old sessions become searchable as the rebuild progresses. A future RFC can add on-disk index checkpointing if cold-rebuild latency becomes a felt operational concern.
These targets are verified by a criterion bench against FsStorage rooted in a tmpdir, not against the in-memory index alone. Failure of a CPU-side target blocks the phase; storage-bound rebuild time is monitored, not gated.
Session lifetime and deletion
This RFC treats sessions as immortal. There is no runtime delete_session primitive; storage grows unboundedly with agent activity. This is an explicit scope decision: garbage collection is a separate operational concern (retention policy, archival, export-and-prune) that warrants its own RFC once usage patterns reveal what the right policy is. In the meantime, operators who need to prune can do so at the filesystem layer — JSONL files in sessions/ are safe to delete offline; the index rebuilds from disk on next start.
When delete support lands, it needs to: (a) remove JSONL file, (b) remove postings from the BM25 index, (c) invalidate any in-memory SessionSummary cache. None of that is in scope here.
Auto-compaction as safety net
Overflow compaction stays, because context-window overflow is a hard constraint the client layer can’t enforce. Two changes versus today: (a) compaction additionally populates ConversationMeta.summary so session search can boost it, and (b) compaction is no longer per-topic (there are no topics) — it fires per session, which is what a client would expect anyway.
The existing AgentConfig::compact_threshold continues to fire on token-budget pressure, not overflow-only; “overflow safety net” here is shorthand for “context-pressure-driven, not user-driven.” Discretionary compaction (“I want to clean up this old chat”) is a client concern — the runtime optionally exposes a compact(session_id) helper in a follow-up RFC if clients converge on needing one. Not required to ship this one.
Alternatives
Semantic retrieval via embeddings. Deferred. Lexical BM25 covers the 80% case at zero new deps and microsecond query time. A vector index adds an embedding model or API dependency, hundreds of MB of index storage, and a hybrid-search ranking story. Revisit when lexical recall demonstrably misses on a labeled test set — not before.
Keep topic as a tag. Rejected. With BM25 over messages, tag-based filtering is redundant with query-based retrieval at the cost of requiring disciplined agent tagging and introducing tag-name drift (“cron refactor” vs “cron cleanup”). The tag was the join key between memory and sessions; BM25 is the join key now.
Single unified recall() tool that queries memory and sessions together. Rejected. Two explicit tools are cheaper for the agent to reason about — it knows what it is paying for in each call, and the two stores have different payload-sizing rules (memory entries are small and returned whole; session hits are bounded excerpts). Composition in prompt-space is the right layer.
Agent-curated session-to-entry back-links (linked_entries). Considered and removed from this RFC. The primitive has a reference-rot problem (entry names change or are deleted; the link silently dangles) and its concrete benefit is a recall optimization whose cost — two new tools, a persisted Vec<String>, and a new agent behavior — isn’t justified until BM25 demonstrably misses a case it would have caught. If such a case shows up in practice, a follow-up RFC can propose it with reference-by-id semantics and a measured justification.
Keep read_session(id) as full-history load. Rejected. Unbounded reads are a context-window hazard and the functionality is better served by list_messages (paginated browse) plus windowed excerpts from search.
Migration
Phased implementation, one commit per phase per CLAUDE.md’s workflow rule. Order is deliberate: delete first, build on a clean foundation, then layer the search feature. This avoids the awkward intermediate state where the topic subsystem and the new primitives coexist.
Phase 1 — Delete the topic subsystem. Remove switch_topic, search_topics, TopicRouter, the tmp/promote gating, the entire crates/crabtalk/src/hooks/topic/ module, Runtime::switch_topic and its helpers, and ConversationMeta.topic (storage-side). Sessions now always persist. EntryKind::Topic is kept for now as a presentation label (see open questions). Commit should be heavily negative line-count — mostly subtraction.
Rollback: git revert. Every phase is one commit; revert is the rollback plan.
Phase 2 — ConversationMeta cleanup. Drop uptime_secs. Add updated_at and message_count, wired into append_message. Verify zero-touch read of existing session files via serde(default). Add nextest coverage for mixed-version reads.
Phase 3 — Session BM25 index + search_sessions tool. New index in the runtime (decide lift-vs-parallel with memory crate’s bm25::Index inside this phase). Incremental updates on append_message. New tool wired through the hook registry. Add a criterion bench verifying the performance budget (§ Performance budget and cold-start). If cold-start rebuild exceeds 500ms at 100k messages, this phase also adds on-disk checkpointing before merge.
Phase 4 — summary field + overflow compaction wiring. Populate ConversationMeta.summary during compaction. Thread it into search_sessions as the 3× boost field. Nextest coverage: session with a summary ranks above an otherwise-equivalent session without one for the same query.
Phase 5 — Documentation. Update CLAUDE.md / CONTRIBUTING.md on the runtime-vs-client boundary. Update hook examples that referenced topics. Move 0171 into superseded.md.
Open questions
EntryKind::Topicfate. Keep as a purely presentational label for long-form aggregated entries, or delete entirely and treat “wiki” entries as ordinaryprojectentries? The label earns its keep only if a UI or search-ranking consumer branches on it. Current lean: delete in a follow-up once Phase 1–5 are stable and we can confirm no consumer actually reads the tag.- On-disk index checkpointing. Governed by the Phase 3 bench. If cold-start stays within budget, defer; if not, land it inline. Decision deferred to measurement, not debate.
- Session BM25 field-weight calibration. Adopt community defaults as-is. A labeled test set of ≥50 queries with known-relevant sessions triggers a re-tuning pass if agent recall on that set falls below 80% top-3 hit rate. Until that set exists, the weights are frozen.
- Discretionary
compact(session_id)helper. Ship only when a client demands it. Not in this RFC.
0189 - Policy at the Edge
- Feature Name: Policy at the Edge
- Start Date: 2026-04-28
- Discussion: #188, #189
- Crates: core, runtime, crabtalk, sdk
- Supersedes: 0000 (Compaction)
- Updates: 0075 (Hook), 0150 (Memory Store), 0185 (Session Search)
Summary
Mechanism belongs in the daemon; policy belongs at the edge. The daemon stops making decisions on the user’s behalf — it no longer auto-compacts on a token-count heuristic, no longer spawns title-generation calls in the background, no longer BM25-searches memory and injects synthetic <recall> user turns. Each of these is now an explicit RPC the client calls when (and if) it wants the behavior. A new AgentEvent::ContextUsage { usage } carries real per-step token counts so clients can pick their own pressure threshold. The Hook::on_before_run lifecycle method is removed.
Motivation
Three independent features had drifted toward the same anti-pattern: the daemon making policy decisions using its own heuristics, then mutating conversation state on the user’s behalf without being asked. RFC 0000 codified auto-compaction at a chars/4-derived threshold. RFC 0038 (then 0150) codified auto-recall as a per-turn before-run injection. The runtime grew a quiet spawn_title_generation call inside finalize_run. Each was useful in isolation. Together they shaped a daemon that thought it knew best.
The cost of that posture:
- Bad heuristics. Token estimation as
chars/4is wrong for code, JSON tool outputs, and non-English prose. The threshold either trips early (destroying live context with an unwanted summary) or trips late (the request fails anyway). The daemon doesn’t have the inputs — model identity, real token counts, user intent — to pick a threshold. Clients do. - Synthetic events. Auto-compaction yielded
AgentEvent::Compactfollowed by hand-forgedTextStart/TextDelta/TextEndevents containing the literal string[context compacted]. Auto-recall injected<recall>...</recall>user turns flaggedauto_injected: true. Both lied to the event stream — the model didn’t say those things, the daemon did. Downstream consumers had to filter them out. - Wasted tokens, opaque costs. Auto-titling spent an LLM call after every conversation that crossed two history entries, behind the user’s back. Auto-recall paid retrieval cost on every turn whether or not the model would have asked.
- Race with the explicit API. All three behaviors had explicit-API counterparts (
compact_conversation, therecalltool, a clearly-named title RPC if the client wanted one). The daemon was racing the client to call its own API.
RFC 0185 already drew the right line for sessions: “the runtime’s job is to provide mechanical primitives. UX decisions belong one layer up in the client.” This RFC carries that all the way through.
Design
Principle
Mechanism in the daemon, policy at the edge. Concretely:
- Mechanism is what only the daemon can do: own conversation state, own storage, own the LLM connection, own MCP child processes, run summarization, write archives. These are inherently centralized.
- Policy is everything else: when to compact, when to title, what to prepend to a user message, what counts as context pressure. These need information the daemon doesn’t have (which model, which UI, which user, which tradeoff matters today). Policy lives in the client — TUI, telegram, web app, headless automation — and is composed from primitives the daemon exposes.
Where this leaves heuristics: the daemon doesn’t run them. If the daemon would need to estimate something to decide, the answer is “don’t decide — surface the data and let the client decide.”
What was removed
Auto-compaction. The block in Agent::run that called self.compact(history) when estimate_tokens(history) > threshold is gone. The synthetic Compact/TextStart/TextDelta(\"[context compacted]\")/TextEnd events are gone. AgentConfig::compact_threshold is gone (silently dropped from existing TOML via serde default). HistoryEntry::estimate_tokens and the chars/4 heuristic are gone.
Auto-titling. Runtime::spawn_title_generation and its finalize_run call site are gone. The title field on Conversation and ConversationMeta stays — existing data is still valid, the daemon just doesn’t generate fresh titles on its own.
Auto-recall. Memory::before_run (the BM25-search-and-inject helper) is gone. MemoryHook::on_before_run is gone. The recall tool is unchanged — model-driven recall continues to work.
Hook::on_before_run. The trait method is removed. OsHook previously used it to inject <environment>working_directory: ...</environment> per turn — that goes too. Bash dispatch still resolves the effective cwd at tool-call time, so commands run in the right directory; the model just doesn’t get a synthetic turn telling it where it is. Clients that want the model to see the cwd put it in their own user message (they supplied it via req.cwd in the first place). The peer-agents <agents> block that DaemonHook::on_before_run injected for delegation moves to DaemonHook::on_build_agent so it lands in the system prompt at agent-build time — registry mutations are visible after the next agent rebuild.
What was added
AgentEvent::ContextUsage { usage: Usage }. Emitted once per LLM call when the provider reports non-zero usage. Carries real prompt_tokens, completion_tokens, total_tokens, plus optional cache-hit/miss and reasoning counts. The corresponding wire event is ContextUsageEvent { usage: TokenUsage }. Clients track these and decide for themselves when to call compact_conversation.
Real compact_conversation. The runtime method previously returned the summary string and silently dropped the persistence work. It now does all four steps in order: summarize → write archive entry → write session compact marker → replace history with a single user message carrying the summary. Atomic from the client’s perspective.
Reference: explicit replacements
Each removed behavior maps to an existing or planned API:
| Removed | Explicit replacement |
|---|---|
| Auto-compaction | compact_conversation(agent, sender) RPC, gated on client-tracked ContextUsage events |
| Auto-titling | A future generate_title(conversation_id) RPC; until then, clients can run their own summarization or leave titles blank |
| Auto-recall | The recall tool (model-driven); or a client-side recall + send composition before the user’s message |
The opt-in client-side helpers for each of these are tracked in #188 as SDK sugars — a few dozen lines on top of the daemon client.
Migration
- New conversations have empty
titleuntil a client asks for one. Existing titles on disk are unaffected. - The
recalltool still works. Clients that previously relied on silent<recall>injection need to either let the model callrecallitself (the intended path) or composerecall + sendclient-side. - No auto-compact. Clients should subscribe to
ContextUsageevents and callcompact_conversationwhen their threshold trips. The model returns an explicit error if context is exceeded — the daemon no longer guesses. compact_thresholdin agent TOML is silently dropped via serde default. No errors, just ignored.
Alternatives considered
Keep auto-compact as a safety net. RFC 0185 took this position: “automatic compaction on overflow as a safety net” because clients can’t see overflow coming. Rejected here because the daemon can’t reliably detect overflow either — chars/4 is the wrong tool, and the model itself returns a clear error when context is exceeded. A bad safety net is worse than none, because clients build trust in it and stop watching.
Threshold-gated ContextPressure event. Emit only when over some threshold. Rejected because it recreates the policy problem in a smaller form — the daemon still picks a number, and is still wrong for whichever model and use case it didn’t anticipate. Always-emit ContextUsage lets clients pick.
Move policy to per-agent config knobs. “Auto-compact off by default; opt in via compact_threshold.” Rejected because the per-agent config is set by the client at create-time anyway — moving the decision a step earlier doesn’t change who decides, just makes the decision harder to update. A per-call decision (the client picks each turn) is more honest.
Out of scope
Two daemon-side per-turn injections in prepare_history survive this RFC: the <instructions> block from Crab.md discovery and the guest-agent-framing prose (“Messages wrapped in <from agent=\"...\">…”). Same anti-pattern, deferred to a separate cleanup so this RFC stays focused.
Wire-protocol changes are limited to the new ContextUsageEvent and reservation of AgentInfo.compact_threshold (field 10). No breaking renumbering, no new RPCs.
0193 - Agent-Owned MCP
- Feature Name: Agent-Owned MCP
- Start Date: 2026-04-28
- Discussion: TBD
- Crates: core, mcp, crabtalk, runtime
- Updates: 0082 (Scoping), 0135 (Agent-First), 0190 (MCP Lifecycle)
Summary
Agents own their MCP servers by value, not by name reference into a daemon-global registry. AgentConfig.mcps becomes Vec<McpServerConfig> — every agent carries the full configuration of every MCP it uses. The daemon’s job shrinks to “spawn what agents declare, dedup identical processes, route tool calls per agent.” Storage::{list,upsert,delete}_mcp and crabtalkd mcp go away. Forking an agent now means copying one config; the new owner gets a self-contained, runnable artifact.
Motivation
The current model treats MCPs as a daemon-level resource that agents reference by name. That made sense when crabtalk was a single-user CLI managing a fixed fleet of tools. It doesn’t fit where the runtime is going.
Forkability is broken. RFC 0135 framed agents as the unit users see and share — sessions are plumbing, agents are the artifact. Cloud workflows extend that: an agent should be a forkable thing, like a GitHub repo. Today, forking an agent’s TOML doesn’t fork its MCPs; the fork lands on a daemon that may or may not have a server registered under the same name, with the same args, with the same env. The agent reference is a dangling pointer until someone manually re-registers the missing pieces.
Namespace pollution is artificial. Two agents that want the same logical MCP with different env (e.g., one read-only token, one admin token) must register two differently-named entries in a global flat namespace. The bridge’s tool_cache: BTreeMap<String, Tool> then logs-and-skips conflicts when both expose web_search. None of that pollution is intrinsic to MCP; it’s a consequence of the registry shape.
The allowlist is a workaround for ownership. AgentConfig.mcps: Vec<String> (RFC 0082) gates which global entries an agent may dispatch to. It exists because the registry is shared. If agents own their MCPs, allowlists become tautological — the agent only dispatches to what it declared.
The cloud target makes this acute. Cloud will import crabtalk as a library and host one agent per tenant (or per agent instance). A daemon-global registry on a multi-tenant host either leaks configurations across tenants or forces the cloud layer to maintain its own per-tenant overlay on top of the registry. Either way the global registry is wrong — the right shape is “agent has its MCPs,” and the cloud’s secret/canonical layer can compose forkable templates above that.
Design
Data model
#![allow(unused)]
fn main() {
struct AgentConfig {
// …
mcps: Vec<McpServerConfig>, // was Vec<String>
}
}
Embedded by value. No enum wrapper, no separate “decl” type. The agent’s TOML carries every field of every MCP it depends on.
Storage loses list_mcps, upsert_mcp, delete_mcp. The protocol RPCs ListMcps, UpsertMcp, DeleteMcp stay — they shift meaning from “manage the global registry” to “list MCPs declared by any registered agent” / “modify an agent’s MCPs in place” / “remove an MCP from an agent’s config.” Implemented by reading and writing through the agent’s config rather than a separate table.
Daemon-side dedup
The daemon never spawns the same MCP twice. Two agents declaring command="github-mcp", args=[...], env={TOKEN: "abc"} share one peer process. Different args or env → separate processes. Identity is structural, not by name.
McpHandler keys peers by fingerprint — a stable hash of (command, args, env, url). The state map becomes BTreeMap<Fingerprint, McpServerEntry> where each entry refcounts the agents that declared it. register_for_agent(agent, cfg) increments the refcount, spawning if first; unregister_for_agent(agent, fingerprint) decrements, tearing down at zero.
The lifecycle event broadcast from RFC 0190 (PR #192) still applies: Connecting / Connected / Failed / Disconnected are emitted per fingerprint, not per name. The event payload identifies the server by fingerprint plus the set of agents that own a reference to it.
Per-agent tool namespace
The bridge stops sharing a flat tool_cache. Two agents declaring different MCPs that both expose a web_search tool no longer collide — the dispatcher resolves (agent, tool_name) to the right peer through the agent’s declared fingerprints.
Concretely: McpBridge keeps the per-fingerprint peer map but drops the global tool cache. Tool lookup walks the agent’s fingerprints in declaration order and returns the first match. McpHook::dispatch already has the agent context; it now uses the agent’s declared MCPs directly instead of consulting an AgentScope.mcps allowlist.
Lifecycle interactions
- Agent create / update.
Runtime::create_agentandupdate_agentwalk the config’smcpslist, callingMcpHandler::register_for_agent(agent, cfg)for each. New fingerprints spawn; existing fingerprints just bump the refcount. - Agent delete. Walks the agent’s
mcps, callsunregister_for_agentfor each. Peers with refcount=0 are torn down.Disconnectedevents fire. - Agent rename. Refcounts move from
old_nametonew_name. No spawn/teardown. - Daemon startup. Storage rebuilds agents one by one; each
register_for_agentcall walks the same dedup path. No special “load global MCPs” phase. - Daemon reload. Already rebuilds agents (RFC 0189-era refactor). Same path. New configs trigger spawns; removed fingerprints trigger teardowns.
Where secrets are not
The daemon stores literal McpServerConfig values. There is no placeholder syntax, no resolver trait, no interpolation in this codebase. If a value looks like ${TAVILY_KEY}, the daemon spawns a process with that literal string in the environment.
The “canonical with placeholders / materialized with values” split lives in whatever sits above the daemon. Cloud’s control plane holds canonical agent configs (with ${TAVILY_KEY}), resolves against the tenant’s vault, and writes the resolved config to the daemon-as-library it owns for that tenant. Forks copy the canonical, never the resolved.
This keeps the forkability invariant — shareable artifacts carry structure, not values — while keeping the daemon secret-unaware.
Migration
AgentConfig.mcps is a breaking field type change (Vec<String> → Vec<McpServerConfig>). Existing configs on disk need a one-shot migration:
- On daemon startup, if any agent’s
mcpsisVec<String>(detected via serde), look each name up in the existingmcps.toml(or whatever Storage held the global registry), inline theMcpServerConfig, and rewrite the agent’s TOML. - After every agent has been migrated, delete the global
mcps.toml.
The migration runs once. After the first startup on the new code, configs are uniformly the new shape; the migration code path is dead and gets removed in a follow-up cleanup commit.
Storage::list_mcps / upsert_mcp / delete_mcp are removed from the trait. Implementations — FsStorage, MemStorage — drop the corresponding files/fields. The protocol RPCs ListMcps / UpsertMcp / DeleteMcp stay on the wire; their handlers are rewritten to operate on agent configs.
AgentScope.mcps (RFC 0082) is removed. The scoping struct still gates tools and skills; MCP scoping is now intrinsic to the agent’s declaration.
Alternatives considered
Keep the global registry, add per-agent overrides. Allow AgentConfig.mcps to carry inline overrides on top of name references. Rejected because it doubles the configuration surface — every consumer has to handle “which wins, the override or the registry?” — without solving forkability. Forking an agent still depends on the destination daemon having the right names registered.
SecretResolver trait in this repo. Earlier draft. Cut because the daemon can stay secret-unaware: cloud handles canonical-vs-resolved at its control plane and only writes resolved configs into the daemon. Adding a trait here for a default that just reads env vars is complexity for a problem we don’t have.
Generic on Daemon for the resolver. Even if a resolver lived in this repo, adding a second type parameter to Daemon<P> compounds complexity per the no-generics-for-future-use rule. Not worth it for a hypothetical hook.
Package-provided MCPs as agent templates. Package install/uninstall lives in crabup, not the daemon, so this collapses. Future package-like artifacts compose at the agent level rather than at a separate MCP-registry level.
Out of scope
- Secret resolution, vaulting, or
${VAR}interpolation. Cloud’s problem, not the daemon’s. - Auto-restart behavior for failed peers. Lifecycle events from PR #192 surface failures; whether a client retries is a client decision.
- Discovery of port-file MCPs. Today
McpHandlerauto-connects services that drop a*.portfile under~/.crabtalk/run/. That mechanism continues to work, but discovered servers now register against a synthetic per-process “discovery agent” (or are exposed only on the daemon-internal dispatch path) — the exact shape is a follow-up. - Package MCPs. Package install lives in crabup; no daemon-side migration needed.
Superseded RFCs
RFCs that have been replaced by newer designs. Kept for historical reference.
| RFC | Title | Superseded by |
|---|---|---|
| 0000 | Compaction | 0189 - Policy at the Edge |
| 0038 | Memory | 0150 - Memory Store |
| 0064 | Session | 0135 - Agent-First Protocol |
| 0078 | Compact Session | 0135 - Agent-First Protocol |
| 0171 | Topic Switching | 0185 - Session Search and Storage Primitives |