Introduction
This is the crabtalk development book — the knowledge base you check before building. It captures what crabtalk stands for, how the system is shaped, and the design decisions that govern its evolution.
For user-facing documentation (installation, configuration, commands), see crabtalk.ai.
How this book is organized
- Manifesto — What crabtalk is and what it stands for.
- RFCs — Design decisions and features.
RFCs
Code tells you what the system does. Git history tells you when it changed. RFCs tell you why — the problem, the alternatives considered, and the reasoning behind the choice. When you’re about to build something new, RFCs are where you check whether the problem has been thought through before.
Not every change needs an RFC. Bug fixes, refactors, and small improvements go through normal pull requests. RFCs are for decisions that establish rules, contracts, or interfaces that others need to know about before building.
Format
Each RFC is a markdown file with the following structure:
- Header — Feature name, start date, link to discussion, affected crates.
- Summary — One paragraph describing the decision.
- Motivation — What problem does this solve? What use cases does it enable?
- Design — The technical design. Contracts, responsibilities, interfaces.
- Alternatives — What else was considered and why it was rejected.
- Unresolved Questions — Open questions for future work.
Lifecycle
- Open an issue on GitHub describing the feature or design problem.
- Implement it. Iterate through PRs until it’s merged.
- Once merged, write the RFC documenting the decision and add it to
SUMMARY.md.
The RFC number is the issue number or the PR number that introduced the feature. RFCs are written after implementation, not before — they record decisions that were made, not proposals for decisions to come.
Manifesto
Ownership is necessary for an open agent ecosystem.
Ownership is not configuration. A configured agent is one where you picked from someone else’s menu. An owned agent is one where you decided what’s on the menu. Ownership is the power to compose your own stack.
Every agent application today rebuilds session management, command dispatch, and event streaming from scratch — then bundles it alongside search, browser automation, PDF parsing, TTS, image processing, and dozens of tools you didn’t ask for into one process. If you want a Telegram bot with search, you carry nineteen other channels and every integration. If you want a coding agent, you carry TTS and image generation. The process is theirs. The choices are theirs. You run it.
This happens because the daemon layer is missing. Without it, every application must become the daemon. And a daemon that is also an application ships its opinion of what your agent should be.
CrabTalk is that daemon layer. It manages sessions, dispatches commands, and streams the full execution lifecycle to your client. It does not bundle search. It does not bundle gateways. It does not bundle tools. You put what you need on your PATH. They connect as clients. They crash alone. They swap without restarts. The daemon never loads them.
An agent daemon is not an agent application. An agent daemon empowers you to build the application you want — and only the application you want. This is the essence of ownership.
We cannot expect agent platforms to give us ownership out of their beneficence. It is to their advantage to bundle, to lock in, to ship their choices as yours. We should expect that they will bundle. The only way to preserve choice is to never take it away in the first place.
We don’t much care if you prefer a batteries-included experience. You could build an OpenClaw-like assistant or a Hermes-like agent on top of CrabTalk. You can’t build a CrabTalk underneath them. The daemon must come first. The architecture must be right. Everything else follows.
Let us proceed.
0000 - Compaction
- Feature Name: Auto-Compaction
- Start Date: 2025-12-01
- Discussion: foundational design
- Crates: core
Summary
Automatic context management for conversations that outgrow the LLM’s context window. When history exceeds a token threshold, the agent uses the LLM itself to summarize the conversation into a compact briefing that replaces the full history. The conversation continues with no interruption.
Motivation
LLM context windows are finite. A conversation that runs long enough — multi-step tool use, research sessions, debugging loops — will exceed the model’s limit. When that happens, the request fails. The user loses their session.
Every LLM application has to solve this problem. The common approaches are:
- Truncation — drop old messages. Cheap but lossy. The agent forgets decisions, context, and user preferences from earlier in the conversation.
- Sliding window — keep the last N messages. Same problem: the agent loses the beginning of the conversation.
- Retrieval — embed messages and retrieve relevant ones. Heavyweight: requires a vector store, an embedding model, and a retrieval strategy.
Crabtalk’s approach: use the LLM to summarize itself. The same model that’s having the conversation produces a dense summary of everything important. The summary replaces the history. The conversation continues as if nothing happened.
Design
Trigger
After each agent step (LLM response + tool results), the runtime estimates the
token count of the current history. If it exceeds compact_threshold (default
100,000 tokens), compaction fires automatically.
Token estimation is a heuristic: ~4 characters per token, counting message content, reasoning content, and tool call arguments. It’s deliberately rough — the threshold is a safety margin, not a precise limit.
Compaction
The agent sends the full history to the LLM with a compaction system prompt that instructs it to:
Preserve:
- Agent identity (name, personality, relationship notes)
- User profile (name, preferences, context)
- Key decisions and their rationale
- Active tasks and their status
- Important facts, constraints, and preferences
- Tool results still relevant to ongoing work
Omit:
- Greetings, filler, acknowledgements
- Superseded plans or abandoned approaches
- Tool calls whose results have been incorporated
The compaction prompt also includes the agent’s system prompt, so the LLM
preserves identity and profile information from <self>, <identity>, and
<profile> blocks.
The output is dense prose, not bullet points — it becomes the new conversation context and must be self-contained.
Replacement
After compaction:
- The summary is yielded as an
AgentEvent::Compact { summary }. - The session history is replaced with a single user message containing the summary.
- A
[context compacted]text delta is yielded so the user sees it happened. - The agent loop continues — the next step sees the compact summary as its entire history.
On disk, a {"compact":"..."} marker is appended to the session JSONL. On
reload, load_context reads from the last compact marker forward. History
before the marker is archived in place — still in the file, never deleted.
Interaction with other systems
- Memory auto-recall — runs fresh every turn via
on_before_run. Compaction doesn’t affect recall — memories are separate from conversation history. - Client-initiated compact (RFC 0078) — the same
Agent::compact()method, but triggered by the client for @-mention handoff rather than by the token threshold. - Session persistence — compact markers are append-only in the JSONL. The full history survives on disk even after in-memory replacement.
Configuration
Per-agent configurable. None disables auto-compaction. The default of 100,000
tokens leaves headroom below most model context limits (128K–200K) for the
system prompt, tool schemas, and injected context.
Alternatives
Truncation / sliding window. Cheap but the agent loses context. In a multi-step debugging session, forgetting the first half of the investigation means repeating work. Compaction preserves the substance while discarding the noise.
RAG over message history. Retrieve relevant messages via embeddings. More precise than compaction but requires infrastructure (vector store, embedding model) and adds latency to every turn. Compaction is zero-infrastructure — it uses the model already in the conversation.
No automatic compaction. Let the user manage context manually. Rejected because context overflow is invisible until the request fails. The user shouldn’t need to monitor token counts.
Unresolved Questions
- Should the compaction prompt be customizable per agent?
- Should the threshold adapt based on the model’s actual context limit rather than a fixed number?
0009 - Transport
- Feature Name: UDS and TCP Transport Layers
- Start Date: 2026-03-27
- Discussion: #9
- Crates: transport, core
Summary
A transport layer providing Unix domain socket (UDS) and TCP connectivity
between clients and the crabtalk daemon, built on a shared length-prefixed
protobuf codec defined in core.
Motivation
The daemon needs to accept connections from local CLI clients and remote clients (Telegram, web gateways). UDS is the natural choice for same-machine communication — no port management, filesystem-based access control. TCP is required for remote access and cross-platform support (Windows has no UDS).
Both transports share identical framing and message types. The codec and message
definitions belong in core so that any transport can use them without
depending on each other. The transport crate provides the concrete connection
machinery.
Design
Codec (core::protocol::codec)
Wire format: [u32 BE length][protobuf payload]. The length prefix counts
payload bytes only, excluding the 4-byte header itself.
Two generic async functions operate over any AsyncRead/AsyncWrite:
write_message<W, T: Message>(writer, msg)— encode, length-prefix, flush.read_message<R, T: Message + Default>(reader)— read length, read payload, decode.
Maximum frame size is 16 MiB. Frames exceeding this limit produce a
FrameError::TooLarge. EOF during the length read produces
FrameError::ConnectionClosed (clean disconnect, not an error).
Server accept loop
Both UDS and TCP servers share the same pattern:
accept_loop(listener, on_message, shutdown)
listener—UnixListenerorTcpListener.on_message: Fn(ClientMessage, Sender<ServerMessage>)— called for each decoded client message. The sender is per-connection; the callback can send multipleServerMessages (streaming responses) or exactly one (request-response). The channel is unbounded because messages are small and flow-controlled by the protocol — the agent produces responses at LLM speed, far slower than socket drain speed.shutdown—oneshot::Receiver<()>for graceful stop.
Each accepted connection spawns two tasks: a read loop that decodes
ClientMessages and calls on_message, and a send task that drains the
UnboundedSender and writes ServerMessages back. When the read loop ends
(EOF or error), the sender is dropped, which terminates the send task.
TCP specifics
- Default port:
6688. If the port is in use, bind fails — another daemon may already be running. TCP_NODELAYis set on all connections (low-latency interactive protocol).bind()returns astd::net::TcpListener(non-blocking).
UDS specifics
- Unix-only (
#[cfg(unix)]). - Socket path is caller-provided (typically
~/.crabtalk/daemon.sock). - No port management or collision handling — the filesystem path is the identity.
Client trait (core::protocol::api::Client)
Two required transport primitives:
request(ClientMessage) -> Result<ServerMessage>— single round-trip.request_stream(ClientMessage) -> Stream<Item = Result<ServerMessage>>— send one message, read responses until the stream ends.
Both UDS Connection and TCP TcpConnection implement Client identically:
split the socket into owned read/write halves, write via codec, read via codec.
The request_stream implementation reads indefinitely; typed provided methods
on Client (e.g., stream()) handle sentinel detection (StreamEnd).
Connections are not Clone — one connection per session. The client struct
(CrabtalkClient / TcpClient) holds config and produces connections on
demand.
Alternatives
tokio-util LengthDelimitedCodec. Would save the manual length-prefix
code but adds a dependency for ~50 lines of straightforward framing. The
hand-rolled codec is simpler to audit and has no extra allocations.
gRPC / tonic. Full RPC framework with HTTP/2 transport. Heavyweight for a
local daemon protocol. The current design is simpler: raw protobuf over a
length-prefixed stream, no HTTP layer, no service definitions beyond the
Server trait.
Shared generic transport trait. UDS and TCP accept loops are nearly
identical but kept as separate modules. A generic Transport trait would save
~20 lines of duplication but add an abstraction with exactly two implementors.
Not worth it.
Unresolved Questions
- Should the transport support TLS for TCP connections in non-localhost deployments?
- Should there be a connection timeout or keepalive at the transport level, or
is the protocol-level
Ping/Pongsufficient?
0018 - Protocol
- Feature Name: Wire Protocol
- Start Date: 2026-03-27
- Discussion: #18
- Crates: core
Summary
A protobuf-based wire protocol defining all client-server communication for the
crabtalk daemon, with a Server trait for dispatch and a Client trait for
typed request methods.
Motivation
The daemon mediates between multiple clients (CLI, Telegram, web) and multiple
agents. A well-defined wire protocol decouples client and server implementations
and makes the contract explicit. Protobuf was chosen for compact binary
encoding, language-neutral schema, and generated code via prost.
Design
Wire messages (crabtalk.proto)
Two top-level envelopes using oneof:
ClientMessage — 15 variants:
| Variant | Purpose |
|---|---|
Send | Run agent, return complete response |
Stream | Run agent, stream response events |
Ping | Keepalive |
Sessions | List active sessions |
Kill | Close a session |
GetConfig | Read daemon config |
SetConfig | Replace daemon config |
Reload | Hot-reload runtime |
SubscribeEvents | Stream agent events |
ReplyToAsk | Answer a pending ask_user prompt |
GetStats | Daemon stats |
CreateCron | Create cron entry |
DeleteCron | Delete cron entry |
ListCrons | List cron entries |
Compact | Compact session history |
ServerMessage — 11 variants:
| Variant | Purpose |
|---|---|
Response | Complete agent response |
Stream | Streaming event (see below) |
Error | Error with code and message |
Pong | Keepalive ack |
Sessions | Session list |
Config | Config JSON |
AgentEvent | Agent event (for subscriptions) |
Stats | Daemon stats |
CronInfo | Created cron entry |
CronList | All cron entries |
Compact | Compaction summary |
Streaming events
StreamEvent is itself a oneof with 8 variants representing the lifecycle of
a streamed agent response:
Start { agent, session }— stream opened.Chunk { content }— text delta.Thinking { content }— thinking/reasoning delta.ToolStart { calls[] }— tool invocations beginning.ToolResult { call_id, output, duration_ms }— single tool result.ToolsComplete— all pending tool calls finished.AskUser { questions[] }— agent needs user input.End { agent, error }— stream closed (error is empty on success).
The client reads StreamEvents until it receives End, which is the terminal
sentinel.
Agent events
AgentEventMsg carries a kind enum (TEXT_DELTA, THINKING_DELTA,
TOOL_START, TOOL_RESULT, TOOLS_COMPLETE, DONE) plus agent name, session
ID, content, and timestamp. Used by SubscribeEvents for live monitoring of all
agent activity across sessions.
AgentEventMsg overlaps with StreamEvent — both represent the agent execution
lifecycle. StreamEvent is the per-request streaming format (rich, typed
variants). AgentEventMsg is the cross-session monitoring format (flat, single
struct with a kind tag). The duplication exists because monitoring clients need a
simpler, uniform shape to filter and display events from multiple agents.
Server trait
One async method per ClientMessage variant. Implementations receive typed
request structs and return typed responses:
#![allow(unused)]
fn main() {
trait Server: Sync {
fn send(&self, req: SendMsg) -> Future<Output = Result<SendResponse>>;
fn stream(&self, req: StreamMsg) -> Stream<Item = Result<StreamEvent>>;
fn ping(&self) -> Future<Output = Result<()>>;
// ... one method per operation
}
}
The provided dispatch(&self, msg: ClientMessage) -> Stream<Item = ServerMessage> method routes a raw ClientMessage to the correct handler.
Request-response operations yield exactly one ServerMessage; streaming
operations yield many. Errors are mapped to ErrorMsg { code, message } using HTTP status codes with
their standard semantics: 400 (bad request), 404 (not found), 500 (internal
error).
Client trait
Two required transport primitives:
request(ClientMessage) -> Result<ServerMessage>— single round-trip.request_stream(ClientMessage) -> Stream<Item = Result<ServerMessage>>— raw streaming read.
Typed provided methods (send, stream, ping, get_config, set_config)
handle message construction, response unwrapping, and sentinel detection. The
stream() method consumes events via take_while until StreamEnd and maps
each frame through TryFrom<ServerMessage> for type-safe event extraction.
Conversions (message::convert)
From impls wrap typed messages into envelopes (SendMsg -> ClientMessage,
SendResponse -> ServerMessage). TryFrom impls unwrap in the other direction,
returning an error for unexpected variants. This keeps call sites clean — no
manual enum construction.
Alternatives
JSON over WebSocket. Simpler to debug with curl, but larger payloads and
no schema enforcement. Protobuf catches schema mismatches at compile time.
gRPC service definitions. Would provide streaming and code generation out of the box, but brings HTTP/2, tower middleware, and tonic as dependencies. The current approach is lighter: raw protobuf frames over a length-prefixed stream, with hand-written trait dispatch.
Separate request/response ID correlation. The protocol is connection-scoped and sequential — one outstanding request per connection at a time. This is a fundamental design constraint: clients must wait for a response before sending the next request. No need for request IDs or multiplexing. If multiplexing is needed later, it belongs in the transport layer, not the protocol.
Unresolved Questions
- Should the protocol negotiate a version on connect to detect client/server mismatches?
- Should
StreamEndcarry structured error information (code + message) instead of a plain string? - Should there be a
ClientMessagevariant for subscribing to a specific session’s events rather than all events?
0027 - Model
- Feature Name: Model Abstraction Layer
- Start Date: 2026-01-25
- Discussion: #27
- Crates: model, core
Summary
A provider registry that wraps multiple LLM backends (OpenAI, Anthropic, Google,
Bedrock, Azure) behind a unified Model trait, with per-model provider
instances, runtime model switching, and retry logic with exponential backoff.
Motivation
The daemon talks to LLMs. Which LLM, from which provider, through which API —
that’s configuration, not architecture. The agent code should call model.send()
and not care whether it’s hitting Anthropic directly or an OpenAI-compatible
proxy.
This requires:
- A single trait that all providers implement.
- A registry that maps model names to provider instances.
- Runtime switching between models without restarting.
- Retry logic for transient failures (rate limits, timeouts).
- Type conversion between crabtalk’s message types and each provider’s wire format.
Design
Model trait (core)
Defined in wcore::model:
#![allow(unused)]
fn main() {
pub trait Model: Clone + Send + Sync {
async fn send(&self, request: &Request) -> Result<Response>;
fn stream(&self, request: Request) -> impl Stream<Item = Result<StreamChunk>>;
fn context_limit(&self, model: &str) -> usize;
fn active_model(&self) -> String;
}
}
The trait is in core because agents are generic over Model. The implementation
lives in the model crate.
Provider
Wraps crabllm_provider::Provider (the external multi-backend LLM library)
behind the Model trait. Each Provider instance is bound to a specific model
name and carries:
- The backend connection (OpenAI, Anthropic, Google, Bedrock, Azure).
- A shared HTTP client.
- Retry config:
max_retries(default 2) andtimeout(default 30s).
Base URL normalization strips endpoint suffixes (/chat/completions,
/messages) so both bare origins and full paths work in config.
ProviderRegistry
Implements Model by routing requests to the correct provider based on the
model name in the request.
ProviderRegistry
├── providers: BTreeMap<String, Provider> # keyed by model name
├── active: String # default model
└── client: reqwest::Client # shared across providers
- Construction: one
ProviderDefcan list multiple model names. Each gets its ownProviderinstance. Duplicate model names across definitions are rejected at validation time. - Routing:
send()andstream()look up the provider byrequest.model. Callers get a clone of the provider — the registry lock is not held during LLM calls. - Switching:
switch(model)changes the active default. Agents can still override per-request via the model field. - Hot add/remove: providers can be added or removed at runtime without rebuilding the registry.
Retry logic
Non-streaming send() retries transient errors (rate limits, timeouts) with
exponential backoff and full jitter:
- Initial backoff: 100ms, doubling each retry.
- Jitter: random duration in
[backoff/2, backoff]. - Max retries: configurable per provider (default 2).
- Non-transient errors (auth failures, invalid requests) fail immediately.
Streaming does not retry — the connection is already established.
Type conversion
A convert module translates between wcore::model types (Request, Response,
Message, StreamChunk) and crabllm_core types (ChatCompletionRequest,
ChatCompletionResponse). This isolates the external library’s types from the
rest of the codebase.
Alternatives
Direct provider calls without a registry. Each agent holds its own provider. Rejected because runtime model switching and centralized configuration require a shared registry.
Trait objects instead of enum dispatch. Box<dyn Model> instead of the
concrete Provider enum. Rejected because Model has generic return types
(impl Stream) that prevent object safety. The enum dispatch via
crabllm_provider::Provider handles this naturally.
Unresolved Questions
- Should the registry support fallback chains (try provider A, fall back to B)?
- Should streaming requests retry on connection failures before the first chunk?
0036 - Skill Loading
- Feature Name: Skill Loading
- Start Date: 2026-03-27
- Discussion: #36
- Crates: runtime
Summary
How crabtalk discovers, loads, dispatches, hot-reloads, and scopes skills. The skill format follows the agentskills.io convention — this RFC covers the loading mechanism, not the format.
Motivation
Agents need extensible behavior without recompilation. Skills are the simplest unit that works: a markdown file with a name, description, and a prompt body. No code generation, no plugin API, no runtime linking.
The format is defined by agentskills.io. What crabtalk needs to decide is how skills are found on disk, how they’re resolved at runtime, how they stay current without restarts, and how agents are restricted to subsets of available skills.
Design
Format
SKILL.md follows the agentskills.io convention.
Required fields: name, description. Optional: allowed-tools. The markdown
body is the skill prompt.
Discovery
SkillHandler::load(dirs) scans a list of directories (in config-defined order)
recursively for SKILL.md files. Each skill lives in its own directory:
skills/
check-feeds/
SKILL.md
summarize/
SKILL.md
Nested organization is supported (skills/category/my-skill/SKILL.md). Hidden
directories (.-prefixed) are skipped. Duplicate names across directories are
detected and skipped with a warning — first-loaded wins, in config-defined
directory order.
Registry
A Vec<Skill> wrapped in Mutex inside SkillHandler. Linear scan — the
registry is small enough that indexing is unnecessary. Supports add, upsert
(replace by name), contains, and skills (list all).
Dispatch
Exposed as a tool the agent can call. Input: { name: string }.
Resolution order:
- Scope check — if the agent has a skill scope and the name is not in it, reject.
- Path traversal guard — reject names containing
..,/, or\. - Exact load from disk — for each skill directory, check
{dir}/{name}/SKILL.md. If found, parse it, upsert into the registry, return the body. - Fuzzy fallback — if no exact match, substring search the registry by name and description. If input is empty, list all available skills (respecting scope).
Hot reload
The upsert on exact load (step 3) is the hot-reload mechanism. When a skill is invoked, it’s always loaded fresh from disk and upserted into the registry. Skills can be updated on disk and picked up on next invocation without daemon restart.
Slash command resolution
Before a message reaches the agent, preprocess resolves leading /skill-name
commands. For each skill directory, it checks {dir}/{name}/SKILL.md. If found,
the skill body is wrapped in a <skill> tag and injected into the message. This
happens before tool dispatch — it’s prompt injection, not a tool call.
Scoping
Agents can be restricted to a subset of skills via AgentScope.skills. If
non-empty, only listed skills are available. Empty means unrestricted. Scoping
applies to both exact load, fuzzy listing, and slash resolution.
Alternatives
Code-based plugins (dylib / WASM). Far more powerful but far more complex. Skills are prompt injection, not code execution. The simplicity of markdown files is the point.
Database-backed registry. Adds persistence complexity for a registry that rebuilds in milliseconds from disk. Not needed.
Unresolved Questions
- Should skills support arguments beyond the skill name (parameterized prompts)?
- Should
allowed-toolsbe enforced at the runtime level? Currently it is not enforced — it exists in the format but has no runtime effect.
0038 - Memory
- Feature Name: Memory System
- Start Date: 2026-02-10
- Discussion: #38
- Crates: runtime
Summary
File-per-entry memory with BM25-ranked recall, a curated index (MEMORY.md), and an identity file (Crab.md) for agent personality. No database — just files.
Motivation
Agents need persistent knowledge across sessions. The original approach used a graph memory backed by a database, but that added operational weight and complexity for what is fundamentally a collection of text entries that need to be searched.
The system must:
- Store entries as individual files (inspectable, editable by humans).
- Search by relevance, not just exact match.
- Inject relevant memories automatically before each agent turn.
- Support a curated overview (MEMORY.md) that is always present in context.
- Support an identity/soul file (Crab.md) for agent personality.
Design
Directory structure
~/.crabtalk/config/
├── Crab.md # identity file (one level above memory/)
└── memory/
├── entries/
│ ├── entry-name.md
│ └── ...
└── MEMORY.md
Crab.md lives one level above memory/ because it’s an agent-level identity
file, not a memory entry. It’s shared across the config, not scoped to memory.
Entry format
Frontmatter markdown. Each entry has a name, description (used for search), and content.
---
name: Entry Name
description: Short searchable description
---
Long-form content here.
Filenames are slugified from the entry name: entry-name.md.
Recall pipeline
BM25 scoring over all entries. The query is matched against the concatenation of
description + content. Results are ranked by relevance and capped at
recall_limit (configurable).
Auto-recall
Before each agent turn (on_before_run), the system extracts the first 8 words
of the last user message (an arbitrary cutoff — short enough to avoid noise,
long enough to carry intent), runs recall(), and injects matching results as
an auto-injected <recall> block. Auto-injected messages are not persisted and
are refreshed every turn.
System prompt injection
- MEMORY.md — injected as a
<memory>block in the system prompt viabuild_prompt(). Always present if non-empty. - Crab.md — the identity file. Injected via
build_soul(). Writing is gated bysoul_editableconfig. - Memory prompt — instructions for the agent on how to use memory tools,
included from
prompts/memory.md.
Tools
Four tools exposed to agents:
remember(name, description, content)— create or overwrite an entry.forget(name)— delete an entry.recall(query, limit)— BM25 search, returns formatted results.memory(content)— overwrite MEMORY.md index.
Alternatives
Graph memory with database. The original system. Rejected for operational complexity. Files are simpler, inspectable, and sufficient for the use case.
Embedding-based search. Would require a vector store and embedding model. BM25 is fast, dependency-free, and works well enough for the entry sizes we deal with.
Single file storage. One big memory file instead of file-per-entry. Rejected because individual files are easier to inspect, edit, and version.
Unresolved Questions
- Should auto-recall use more than the first 8 words for the query?
- Should entries support tags or categories for non-BM25 filtering?
0043 - Component System
- Feature Name: Component System
- Start Date: 2026-02-15
- Discussion: #43
- Crates: command
Summary
Crabtalk components are independent binaries that install as system services and connect to the daemon via auto-discovery. They crash alone, swap without restarts, and the daemon never loads them. This is the manifesto’s composition model made concrete.
Motivation
The manifesto says: “You put what you need on your PATH. They connect as clients. They crash alone. They swap without restarts.”
This requires a system where components — search, gateways, tool servers — are not subprocesses of the daemon. They’re independent programs that run as system services. The daemon discovers them at runtime. A broken component cannot take the daemon down.
Other projects spawn MCP servers as child processes. If the child hangs or crashes, it can take the daemon with it: zombie processes, leaked file descriptors, blocked event loops. The subprocess model creates shared fate. The component model eliminates it.
Design
The contract
A component is a binary that:
- Installs itself as a system service (launchd, systemd, or schtasks).
- Writes a port file to
~/.crabtalk/run/{name}.porton startup. - Serves an HTTP API (MCP protocol) on that port.
The daemon scans ~/.crabtalk/run/*.port at startup and discovers components
automatically. No configuration needed — drop a component on PATH, install it,
and the daemon finds it.
Service trait
#![allow(unused)]
fn main() {
pub trait Service {
fn name(&self) -> &str; // "search"
fn description(&self) -> &str; // human readable
fn label(&self) -> &str; // "ai.crabtalk.search"
}
}
The trait provides default start, stop, and logs methods:
- start — renders a platform-specific service template, installs and launches.
- stop — uninstalls the service and removes the port file.
- logs — tails
~/.crabtalk/logs/{name}.log.
MCP service
Components that expose tools to agents extend McpService:
#![allow(unused)]
fn main() {
pub trait McpService: Service {
fn router(&self) -> axum::Router;
}
}
run_mcp binds a TCP listener on 127.0.0.1:0, writes the port to the
run directory, and serves the router. The daemon discovers it on next scan.
Platform support
Service templates are platform-specific:
- macOS — launchd plist (
~/Library/LaunchAgents/) - Linux — systemd user unit
- Windows — schtasks with XML task definition
Auto-discovery
The daemon scans ~/.crabtalk/run/*.port for port files not already connected.
Each file contains a port number. The daemon connects via
http://127.0.0.1:{port}/mcp. No subprocess management, no shared fate.
Crash? The daemon doesn’t care — it was never the component’s parent process. Restart? New port file, the daemon picks it up on next reload. Update a component? Install the new version, restart the service — the daemon sees the new port on next scan.
Entry point
The run() function handles tracing init and tokio bootstrap for all component
binaries.
Alternatives
Subprocess management. The daemon spawns and manages components as child processes. Rejected because shared fate — a broken child can break the daemon. This is the approach we explicitly designed against.
Docker / containerization. Run components in containers. Rejected because crabtalk is local-first. System services are the right abstraction for a personal daemon on your machine.
Shell scripts for service management. Works on Unix, breaks on Windows, drifts across components. A shared Rust crate is portable and stays consistent.
Unresolved Questions
- Should the Service trait support health checks?
- Should the daemon watch the run directory for new port files instead of scanning only at startup/reload?
0064 - Session
- Feature Name: Session System
- Start Date: 2026-02-25
- Discussion: #64
- Crates: core, daemon
Summary
Append-only JSONL session persistence with compact markers, identity-based file naming, and an auto-injected message lifecycle that separates ephemeral context from durable history.
Motivation
An agent daemon needs conversation persistence that is simple, inspectable, and crash-safe. Database-backed persistence adds operational weight for what is fundamentally a sequential log. The session format must support:
- Resuming conversations across daemon restarts.
- Compaction — summarizing long histories without losing them.
- Multiple identities — the same agent can talk to different users/platforms.
- Ephemeral context injection — memory recall, environment blocks, and agent descriptions must be fresh each run, never accumulating in history.
Design
File format
Each session is a JSONL file. Line 1 is metadata, subsequent lines are messages or compact markers.
{"agent":"crab","created_by":"user","created_at":"...","title":"","uptime_secs":0}
{"role":"user","content":"hello"}
{"role":"assistant","content":"hi there"}
{"compact":"Summary of conversation so far..."}
{"role":"user","content":"what were we talking about?"}
Naming
Files live in a flat sessions/ directory:
{agent}_{sender_slug}_{seq}.jsonl
sender_slug— sanitized identity (e.g.user,tg-12345).seq— monotonically increasing per (agent, sender) pair.- After
set_title, the file is renamed to append a title slug.
Compact markers
When history exceeds a threshold, the agent compacts: the LLM summarizes the
conversation, and a {"compact":"..."} line is appended. On load,
load_context reads from the last compact marker forward. The compact
summary is injected as a {"role":"user"} message — the agent sees it as
context, not as a special marker.
History before the last compact marker is archived in place — still in the file, but not loaded. Nothing is deleted.
Auto-injected messages
Messages marked auto_injected: true are:
- Not persisted to JSONL (skipped in
append_messages). - Stripped before each run (prevents accumulation).
- Re-injected fresh via
Hook::on_before_run()every execution.
This covers memory recall results, environment blocks, agent description lists, and working directory announcements. They must be current, not stale from a previous run.
Session identity
Sessions are bound to an (agent, sender) pair. find_latest_session scans the
directory for the matching prefix and returns the highest seq number. New chats
increment the seq.
Uptime tracking
Each session tracks uptime_secs — accumulated active time, persisted to the
meta line. The meta line is rewritten by reading the full file and writing it
back with the updated first line. This is the one non-append operation — it
trades the append-only guarantee for keeping metadata current. Crash during
rewrite can lose the meta line but not the conversation history (messages are
append-only and survive).
Alternatives
SQLite. Adds a dependency and operational surface for what is a sequential append log. JSONL files are inspectable with standard tools and trivially backupable. Appends are crash-safe (partial last line is just a truncated write).
One file per message. Too many files. The append-only JSONL approach gives one file per conversation with clear boundaries.
No compaction. Works for short conversations but becomes expensive as history grows. The compact marker approach keeps the file intact while bounding the working context.
Unresolved Questions
- Should session files be organized in date-based subdirectories for easier cleanup?
- Should compact threshold be per-agent configurable or global?
0075 - Hook
- Feature Name: Hook Lifecycle
- Start Date: 2026-03-15
- Discussion: #75
- Crates: core, runtime, daemon
Summary
The Hook trait is the central extensibility point for agent lifecycle. It defines five methods that the runtime calls at specific points: building an agent, registering tools, preprocessing input, injecting context before a run, and observing events. Everything that customizes agent behavior — skills, memory, MCP, scoping, prompt injection — composes through this trait.
Motivation
When the runtime was split out of the daemon (#75), a clean interface was needed
between the runtime (which executes agents) and the hook implementations (which
customize them). The runtime must not know about skills, memory, MCP, or daemon
infrastructure. It only knows it has a Hook and calls its methods at the right
times.
This separation enables two modes: the daemon (full hook with skills, MCP, memory, event broadcasting) and embedded use (no hook, or a minimal one).
Design
The trait
#![allow(unused)]
fn main() {
pub trait Hook: Send + Sync {
fn on_build_agent(&self, config: AgentConfig) -> AgentConfig;
fn on_register_tools(&self, tools: &mut ToolRegistry) -> impl Future<Output = ()>;
fn preprocess(&self, agent: &str, content: &str) -> String;
fn on_before_run(&self, agent: &str, session_id: u64, history: &[Message]) -> Vec<Message>;
fn on_event(&self, agent: &str, session_id: u64, event: &AgentEvent);
}
}
All methods have default no-op implementations. () implements Hook.
Lifecycle points
on_build_agent — called when an agent is registered with the runtime.
Receives the agent config, returns a modified config. This is where the system
prompt is composed. The RuntimeHook implementation chains:
- Environment block (OS, shell, platform).
- Memory prompt (MEMORY.md content as
<memory>block). - Resource hints (available MCP servers, available skills).
- Scope block (if agent has restricted skills/MCPs/members, appends a
<scope>XML block listing allowed resources). - Tool whitelist computation (restricts
config.toolsbased on scope).
on_register_tools — called at runtime startup. Registers tool schemas
(name, description, JSON schema) into the ToolRegistry. No handlers — dispatch
is separate. RuntimeHook registers: OS tools, skill tool, task/delegate tool,
ask_user tool, memory tools (if enabled), and MCP-discovered tools.
preprocess — called before a user message enters the conversation. Used
for slash command resolution: /skill-name args is transformed into the skill
body wrapped in a <skill> tag. Happens before tool dispatch.
on_before_run — called before each agent execution (send/stream). Returns
messages to inject into the conversation. RuntimeHook injects:
- Agent descriptions (if the agent has delegation members).
- Memory auto-recall (BM25 search on last user message, as
<recall>block). - Working directory announcement (as
<environment>block).
All injected messages are marked auto_injected: true — they’re ephemeral, not
persisted, stripped before each run, and refreshed.
on_event — called after each agent step. Receives every AgentEvent
(text deltas, tool calls, completions). DaemonBridge uses this to broadcast
events to console subscribers.
Composition
RuntimeHook<B: RuntimeBridge> is the engine hook. It composes SkillHandler,
McpHandler, Memory, and AgentScope maps. It implements Hook by
orchestrating all subsystems.
DaemonHook is a type alias: RuntimeHook<DaemonBridge>. The daemon bridge
adds ask_user dispatch, delegate dispatch, session CWD, and event broadcasting.
For embedded use, RuntimeHook<NoBridge> provides the full engine without
daemon infrastructure.
Tool dispatch
RuntimeHook::dispatch_tool is the central routing table — a match on tool
name. It’s not part of the Hook trait itself (the trait only registers
schemas). The runtime calls dispatch_tool when an agent produces a tool call.
Dispatch enforces scoping before routing.
Alternatives
Separate traits per concern. One trait for prompt building, one for tools, one for events. Rejected because they always compose together and the single trait is simpler to implement and reason about.
Closure-based hooks. Pass lambdas instead of a trait. Rejected because the hook needs shared state (skill registry, MCP connections, memory) that closures make awkward.
Unresolved Questions
- Should
on_build_agentbe async to support hooks that need I/O during agent construction? - Should
preprocesssupport returning multiple messages (e.g. for multi-skill invocation)?
0078 - Compact Session
- Feature Name: Compact Session Interface
- Start Date: 2026-03-25
- Discussion: #78
- Crates: core, daemon
Summary
Expose session compaction as a protocol operation so clients can request a concise context summary on demand, enabling cross-agent context handoff with custom @-mention logic.
Motivation
When a user @-mentions a different agent mid-conversation, the client needs to hand off context. The naive approaches don’t work:
- Raw history includes irrelevant tool results, thinking tokens, and the previous agent’s system prompt — expensive and noisy.
- No context means the target agent flies blind.
Compact produces a focused briefing: the LLM summarizes the conversation into essential context. The target agent gets its own system prompt (warm in token cache) plus the compact summary plus the user’s query — high quality context, minimal tokens.
The key insight: this belongs in the protocol, not the client. The daemon already has the session history and the LLM connection. The client just needs to say “compact session N” and get a summary back. But the mention logic itself stays in the client — the daemon doesn’t know about @-mentions, UI conventions, or which agent to route to. The client decides when and why to compact; the daemon does the summarization.
Design
A Compact message is added to the protobuf protocol:
- Request:
CompactRequest { session: u64 }— client asks the daemon to compact a specific session. - Response:
CompactResponse { summary: string }— the daemon returns the summarized context.
The Server trait gains a compact_session method. The daemon implementation
delegates to Agent::compact(), which sends the session history to the LLM
with a compaction prompt that preserves identity and profile information.
What the daemon does
- Accepts the compact request via the protocol.
- Loads the session history.
- Calls the agent’s compact method (LLM summarization).
- Returns the summary string.
What the client does
- Detects @-mentions (its own UI logic).
- Requests compact of the current session.
- Creates or selects the target agent’s session.
- Sends the compact summary + user query to the target agent.
Context selection alternatives
If compact is too slow for the use case:
- BM25 — already in the codebase for memory recall. Keyword-match messages against the query.
- Last N messages — simplest. Often sufficient for short conversations.
These are client-side decisions. The compact interface doesn’t preclude them.
Alternatives
Client-side compaction. The client could do its own summarization, but it would need LLM access and session history — duplicating what the daemon already has.
Automatic compaction on mention. The daemon could detect @-mentions and compact automatically. Rejected because mention syntax is a client concern — different clients have different conventions.
Unresolved Questions
- Should compact accept parameters (max tokens, focus query) to guide summarization?
- Should the daemon cache compact results for repeated handoffs within the same conversation?
0080 - Cron
- Feature Name: Daemon-Level Cron Scheduler
- Start Date: 2026-03-20
- Discussion: #80
- Crates: daemon
Summary
A daemon-level cron system that triggers skills into sessions on a schedule, replacing the previous per-agent heartbeat mechanism.
Motivation
Agents need periodic behavior — checking feeds, running maintenance, sending reminders. The original approach was a per-agent heartbeat config, but this was dead code and wrong-shaped: heartbeats are uniform intervals, while scheduled tasks need cron-style flexibility (every Monday at 9am, every 2 hours, etc.).
The session already carries the agent and sender. A cron entry only needs to know which skill to fire and which session to fire it into.
Design
A cron entry triggers a skill into a session on a schedule.
Data model
[[cron]]
id = 1
schedule = "0 */2 * * *"
skill = "check-feeds"
session = 12345
quiet_start = "23:00"
quiet_end = "07:00"
once = false
id— auto-incremented on create.schedule— standard cron expression, validated on create and load.skill— fired as/{skill}slash command into the session.session— target session ID. The session determines the agent.quiet_start/quiet_end— optional HH:MM window in the daemon’s local time. If fire time falls inside, skip silently. No queuing, no catch-up. Both must be set; if only one is provided, quiet hours are ignored.once— fire once then delete from memory and disk.
Persistence
Memory is authoritative at runtime. Disk (crons.toml) is recovery for
restarts.
- Startup: load from disk, start timers. Invalid schedules are skipped with a warning.
- Create/Delete: mutate memory, start/stop timer, atomic write to disk (tmp + rename).
- Runtime reload: crons stay in memory — they survive runtime swaps.
- Daemon restart: reload from disk.
Firing
Fire-and-forget via the daemon event channel. The cron sends a ClientMessage
with content /{skill} and sender "cron". The reply channel is dropped —
output goes to session history only.
Protocol
Three protocol operations on the Server trait:
CreateCron { schedule, skill, session, quiet_start?, quiet_end? }→CronInfoDeleteCron { id }→ success/not foundListCrons→CronList
Crons are process-lifetime, not session-lifetime. They survive runtime reloads, fire via the daemon event channel, and the runtime has no notion of time-based scheduling. This is a daemon concern.
Alternatives
Per-agent heartbeat config. The original approach. Rejected because it coupled scheduling to agent definition, couldn’t express cron-style schedules, and was dead code.
Client-side polling. A client can send messages on its own timer. This works but requires the client to be running. Daemon crons fire regardless of client state.
Unresolved Questions
- Should crons support arguments beyond the skill name?
- Should there be a max cron count to prevent resource exhaustion?
0082 - Scoping
- Feature Name: Agent Scoping
- Start Date: 2026-03-22
- Discussion: #82
- Crates: runtime, core
Summary
A whitelist-based scoping system that restricts what an agent can access: tools, skills, MCP servers, and delegation targets. Enforced at dispatch time and advertised in the system prompt. This is a security boundary, not a hint.
Motivation
In multi-agent setups, a delegated sub-agent should not have the same capabilities as the primary agent. A research agent doesn’t need bash. A summarizer doesn’t need to delegate to other agents. Without scoping, every agent has access to everything — which means a misbehaving or confused agent can call tools it was never intended to use.
Scoping solves this by letting agent configs declare exactly what resources are available. The runtime enforces it.
Design
AgentScope
#![allow(unused)]
fn main() {
pub struct AgentScope {
pub tools: Vec<String>, // empty = unrestricted
pub members: Vec<String>, // empty = no delegation
pub skills: Vec<String>, // empty = all skills
pub mcps: Vec<String>, // empty = all MCP servers
}
}
Empty list means unrestricted. Non-empty means only listed items are allowed. This is an inclusive whitelist, not a denylist.
Whitelist computation
When an agent has any scoping (non-empty skills, mcps, or members), the runtime
computes a tool whitelist during on_build_agent:
- Start with
BASE_TOOLS:bash,ask_user— always available. - If memory is enabled: add
recall,remember,memory,forget. - If skills list is non-empty: add
skilltool. - If mcps list is non-empty: add
mcptool. - If members list is non-empty: add
delegatetool.
The computed whitelist replaces config.tools. Tools not on the list are
invisible to the agent.
Prompt injection
A <scope> block is appended to the system prompt listing the agent’s allowed
resources:
<scope>
skills: check-feeds, summarize
mcp servers: search
members: researcher, writer
</scope>
This tells the agent what it can use. The agent doesn’t need to guess or discover — its boundaries are stated upfront.
Enforcement
Scoping is enforced at four dispatch points:
dispatch_tool— rejects tool calls not in the agent’s tool whitelist.dispatch_skill— rejects skill names not in the agent’s skill list.dispatch_mcp— filters MCP server list to allowed servers.dispatch_delegate— rejects delegation to agents not in the members list.
Enforcement happens at runtime, not just at prompt time. Even if the LLM
ignores the <scope> block and tries to call a restricted tool, the dispatch
layer rejects it.
Default agent
The default agent (primary) has no scope restrictions — empty lists on all four dimensions. Scoping is for sub-agents that need constrained access.
Alternatives
Denylist instead of whitelist. List what’s forbidden instead of what’s allowed. Rejected because allowlists are safer by default — a new tool or server is inaccessible until explicitly granted. Denylists require updating every time a new resource is added.
Prompt-only scoping. Tell the agent its restrictions in the prompt but don’t enforce at dispatch. Rejected because LLMs don’t reliably follow instructions — a determined or confused model will call tools it was told not to. Enforcement must be at the dispatch layer.
Unresolved Questions
- Should scoping support wildcard patterns (e.g.
mcp: search-*)? - Should scope violations be logged as security events for monitoring?