Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

This is the crabtalk development book — the knowledge base you check before building. It captures what crabtalk stands for, how the system is shaped, and the design decisions that govern its evolution.

For user-facing documentation (installation, configuration, commands), see crabtalk.ai.

How this book is organized

  • Manifesto — What crabtalk is and what it stands for.
  • RFCs — Design decisions and features.

RFCs

Code tells you what the system does. Git history tells you when it changed. RFCs tell you why — the problem, the alternatives considered, and the reasoning behind the choice. When you’re about to build something new, RFCs are where you check whether the problem has been thought through before.

Not every change needs an RFC. Bug fixes, refactors, and small improvements go through normal pull requests. RFCs are for decisions that establish rules, contracts, or interfaces that others need to know about before building.

Format

Each RFC is a markdown file with the following structure:

  • Header — Feature name, start date, link to discussion, affected crates.
  • Summary — One paragraph describing the decision.
  • Motivation — What problem does this solve? What use cases does it enable?
  • Design — The technical design. Contracts, responsibilities, interfaces.
  • Alternatives — What else was considered and why it was rejected.
  • Unresolved Questions — Open questions for future work.

Lifecycle

  1. Open an issue on GitHub describing the feature or design problem.
  2. Implement it. Iterate through PRs until it’s merged.
  3. Once merged, write the RFC documenting the decision and add it to SUMMARY.md.

The RFC number is the issue number or the PR number that introduced the feature. RFCs are written after implementation, not before — they record decisions that were made, not proposals for decisions to come.

Manifesto

Ownership is necessary for an open agent ecosystem.

Ownership is not configuration. A configured agent is one where you picked from someone else’s menu. An owned agent is one where you decided what’s on the menu. Ownership is the power to compose your own stack.

Every agent application today rebuilds session management, command dispatch, and event streaming from scratch — then bundles it alongside search, browser automation, PDF parsing, TTS, image processing, and dozens of tools you didn’t ask for into one process. If you want a Telegram bot with search, you carry nineteen other channels and every integration. If you want a coding agent, you carry TTS and image generation. The process is theirs. The choices are theirs. You run it.

This happens because the daemon layer is missing. Without it, every application must become the daemon. And a daemon that is also an application ships its opinion of what your agent should be.

CrabTalk is that daemon layer. It manages sessions, dispatches commands, and streams the full execution lifecycle to your client. It does not bundle search. It does not bundle gateways. It does not bundle tools. You put what you need on your PATH. They connect as clients. They crash alone. They swap without restarts. The daemon never loads them.

An agent daemon is not an agent application. An agent daemon empowers you to build the application you want — and only the application you want. This is the essence of ownership.

We cannot expect agent platforms to give us ownership out of their beneficence. It is to their advantage to bundle, to lock in, to ship their choices as yours. We should expect that they will bundle. The only way to preserve choice is to never take it away in the first place.

We don’t much care if you prefer a batteries-included experience. You could build an OpenClaw-like assistant or a Hermes-like agent on top of CrabTalk. You can’t build a CrabTalk underneath them. The daemon must come first. The architecture must be right. Everything else follows.

Let us proceed.

0000 - Compaction

  • Feature Name: Auto-Compaction
  • Start Date: 2025-12-01
  • Discussion: foundational design
  • Crates: core

Summary

Automatic context management for conversations that outgrow the LLM’s context window. When history exceeds a token threshold, the agent uses the LLM itself to summarize the conversation into a compact briefing that replaces the full history. The conversation continues with no interruption.

Motivation

LLM context windows are finite. A conversation that runs long enough — multi-step tool use, research sessions, debugging loops — will exceed the model’s limit. When that happens, the request fails. The user loses their session.

Every LLM application has to solve this problem. The common approaches are:

  • Truncation — drop old messages. Cheap but lossy. The agent forgets decisions, context, and user preferences from earlier in the conversation.
  • Sliding window — keep the last N messages. Same problem: the agent loses the beginning of the conversation.
  • Retrieval — embed messages and retrieve relevant ones. Heavyweight: requires a vector store, an embedding model, and a retrieval strategy.

Crabtalk’s approach: use the LLM to summarize itself. The same model that’s having the conversation produces a dense summary of everything important. The summary replaces the history. The conversation continues as if nothing happened.

Design

Trigger

After each agent step (LLM response + tool results), the runtime estimates the token count of the current history. If it exceeds compact_threshold (default 100,000 tokens), compaction fires automatically.

Token estimation is a heuristic: ~4 characters per token, counting message content, reasoning content, and tool call arguments. It’s deliberately rough — the threshold is a safety margin, not a precise limit.

Compaction

The agent sends the full history to the LLM with a compaction system prompt that instructs it to:

Preserve:

  • Agent identity (name, personality, relationship notes)
  • User profile (name, preferences, context)
  • Key decisions and their rationale
  • Active tasks and their status
  • Important facts, constraints, and preferences
  • Tool results still relevant to ongoing work

Omit:

  • Greetings, filler, acknowledgements
  • Superseded plans or abandoned approaches
  • Tool calls whose results have been incorporated

The compaction prompt also includes the agent’s system prompt, so the LLM preserves identity and profile information from <self>, <identity>, and <profile> blocks.

The output is dense prose, not bullet points — it becomes the new conversation context and must be self-contained.

Replacement

After compaction:

  1. The summary is yielded as an AgentEvent::Compact { summary }.
  2. The session history is replaced with a single user message containing the summary.
  3. A [context compacted] text delta is yielded so the user sees it happened.
  4. The agent loop continues — the next step sees the compact summary as its entire history.

On disk, a {"compact":"..."} marker is appended to the session JSONL. On reload, load_context reads from the last compact marker forward. History before the marker is archived in place — still in the file, never deleted.

Interaction with other systems

  • Memory auto-recall — runs fresh every turn via on_before_run. Compaction doesn’t affect recall — memories are separate from conversation history.
  • Client-initiated compact (RFC 0078) — the same Agent::compact() method, but triggered by the client for @-mention handoff rather than by the token threshold.
  • Session persistence — compact markers are append-only in the JSONL. The full history survives on disk even after in-memory replacement.

Configuration

Per-agent configurable. None disables auto-compaction. The default of 100,000 tokens leaves headroom below most model context limits (128K–200K) for the system prompt, tool schemas, and injected context.

Alternatives

Truncation / sliding window. Cheap but the agent loses context. In a multi-step debugging session, forgetting the first half of the investigation means repeating work. Compaction preserves the substance while discarding the noise.

RAG over message history. Retrieve relevant messages via embeddings. More precise than compaction but requires infrastructure (vector store, embedding model) and adds latency to every turn. Compaction is zero-infrastructure — it uses the model already in the conversation.

No automatic compaction. Let the user manage context manually. Rejected because context overflow is invisible until the request fails. The user shouldn’t need to monitor token counts.

Unresolved Questions

  • Should the compaction prompt be customizable per agent?
  • Should the threshold adapt based on the model’s actual context limit rather than a fixed number?

0009 - Transport

  • Feature Name: UDS and TCP Transport Layers
  • Start Date: 2026-03-27
  • Discussion: #9
  • Crates: transport, core

Summary

A transport layer providing Unix domain socket (UDS) and TCP connectivity between clients and the crabtalk daemon, built on a shared length-prefixed protobuf codec defined in core.

Motivation

The daemon needs to accept connections from local CLI clients and remote clients (Telegram, web gateways). UDS is the natural choice for same-machine communication — no port management, filesystem-based access control. TCP is required for remote access and cross-platform support (Windows has no UDS).

Both transports share identical framing and message types. The codec and message definitions belong in core so that any transport can use them without depending on each other. The transport crate provides the concrete connection machinery.

Design

Codec (core::protocol::codec)

Wire format: [u32 BE length][protobuf payload]. The length prefix counts payload bytes only, excluding the 4-byte header itself.

Two generic async functions operate over any AsyncRead/AsyncWrite:

  • write_message<W, T: Message>(writer, msg) — encode, length-prefix, flush.
  • read_message<R, T: Message + Default>(reader) — read length, read payload, decode.

Maximum frame size is 16 MiB. Frames exceeding this limit produce a FrameError::TooLarge. EOF during the length read produces FrameError::ConnectionClosed (clean disconnect, not an error).

Server accept loop

Both UDS and TCP servers share the same pattern:

accept_loop(listener, on_message, shutdown)
  • listenerUnixListener or TcpListener.
  • on_message: Fn(ClientMessage, Sender<ServerMessage>) — called for each decoded client message. The sender is per-connection; the callback can send multiple ServerMessages (streaming responses) or exactly one (request-response). The channel is unbounded because messages are small and flow-controlled by the protocol — the agent produces responses at LLM speed, far slower than socket drain speed.
  • shutdownoneshot::Receiver<()> for graceful stop.

Each accepted connection spawns two tasks: a read loop that decodes ClientMessages and calls on_message, and a send task that drains the UnboundedSender and writes ServerMessages back. When the read loop ends (EOF or error), the sender is dropped, which terminates the send task.

TCP specifics

  • Default port: 6688. If the port is in use, bind fails — another daemon may already be running.
  • TCP_NODELAY is set on all connections (low-latency interactive protocol).
  • bind() returns a std::net::TcpListener (non-blocking).

UDS specifics

  • Unix-only (#[cfg(unix)]).
  • Socket path is caller-provided (typically ~/.crabtalk/daemon.sock).
  • No port management or collision handling — the filesystem path is the identity.

Client trait (core::protocol::api::Client)

Two required transport primitives:

  • request(ClientMessage) -> Result<ServerMessage> — single round-trip.
  • request_stream(ClientMessage) -> Stream<Item = Result<ServerMessage>> — send one message, read responses until the stream ends.

Both UDS Connection and TCP TcpConnection implement Client identically: split the socket into owned read/write halves, write via codec, read via codec. The request_stream implementation reads indefinitely; typed provided methods on Client (e.g., stream()) handle sentinel detection (StreamEnd).

Connections are not Clone — one connection per session. The client struct (CrabtalkClient / TcpClient) holds config and produces connections on demand.

Alternatives

tokio-util LengthDelimitedCodec. Would save the manual length-prefix code but adds a dependency for ~50 lines of straightforward framing. The hand-rolled codec is simpler to audit and has no extra allocations.

gRPC / tonic. Full RPC framework with HTTP/2 transport. Heavyweight for a local daemon protocol. The current design is simpler: raw protobuf over a length-prefixed stream, no HTTP layer, no service definitions beyond the Server trait.

Shared generic transport trait. UDS and TCP accept loops are nearly identical but kept as separate modules. A generic Transport trait would save ~20 lines of duplication but add an abstraction with exactly two implementors. Not worth it.

Unresolved Questions

  • Should the transport support TLS for TCP connections in non-localhost deployments?
  • Should there be a connection timeout or keepalive at the transport level, or is the protocol-level Ping/Pong sufficient?

0018 - Protocol

  • Feature Name: Wire Protocol
  • Start Date: 2026-03-27
  • Discussion: #18
  • Crates: core

Summary

A protobuf-based wire protocol defining all client-server communication for the crabtalk daemon, with a Server trait for dispatch and a Client trait for typed request methods.

Motivation

The daemon mediates between multiple clients (CLI, Telegram, web) and multiple agents. A well-defined wire protocol decouples client and server implementations and makes the contract explicit. Protobuf was chosen for compact binary encoding, language-neutral schema, and generated code via prost.

Design

Wire messages (crabtalk.proto)

Two top-level envelopes using oneof:

ClientMessage — 15 variants:

VariantPurpose
SendRun agent, return complete response
StreamRun agent, stream response events
PingKeepalive
SessionsList active sessions
KillClose a session
GetConfigRead daemon config
SetConfigReplace daemon config
ReloadHot-reload runtime
SubscribeEventsStream agent events
ReplyToAskAnswer a pending ask_user prompt
GetStatsDaemon stats
CreateCronCreate cron entry
DeleteCronDelete cron entry
ListCronsList cron entries
CompactCompact session history

ServerMessage — 11 variants:

VariantPurpose
ResponseComplete agent response
StreamStreaming event (see below)
ErrorError with code and message
PongKeepalive ack
SessionsSession list
ConfigConfig JSON
AgentEventAgent event (for subscriptions)
StatsDaemon stats
CronInfoCreated cron entry
CronListAll cron entries
CompactCompaction summary

Streaming events

StreamEvent is itself a oneof with 8 variants representing the lifecycle of a streamed agent response:

  • Start { agent, session } — stream opened.
  • Chunk { content } — text delta.
  • Thinking { content } — thinking/reasoning delta.
  • ToolStart { calls[] } — tool invocations beginning.
  • ToolResult { call_id, output, duration_ms } — single tool result.
  • ToolsComplete — all pending tool calls finished.
  • AskUser { questions[] } — agent needs user input.
  • End { agent, error } — stream closed (error is empty on success).

The client reads StreamEvents until it receives End, which is the terminal sentinel.

Agent events

AgentEventMsg carries a kind enum (TEXT_DELTA, THINKING_DELTA, TOOL_START, TOOL_RESULT, TOOLS_COMPLETE, DONE) plus agent name, session ID, content, and timestamp. Used by SubscribeEvents for live monitoring of all agent activity across sessions.

AgentEventMsg overlaps with StreamEvent — both represent the agent execution lifecycle. StreamEvent is the per-request streaming format (rich, typed variants). AgentEventMsg is the cross-session monitoring format (flat, single struct with a kind tag). The duplication exists because monitoring clients need a simpler, uniform shape to filter and display events from multiple agents.

Server trait

One async method per ClientMessage variant. Implementations receive typed request structs and return typed responses:

#![allow(unused)]
fn main() {
trait Server: Sync {
    fn send(&self, req: SendMsg) -> Future<Output = Result<SendResponse>>;
    fn stream(&self, req: StreamMsg) -> Stream<Item = Result<StreamEvent>>;
    fn ping(&self) -> Future<Output = Result<()>>;
    // ... one method per operation
}
}

The provided dispatch(&self, msg: ClientMessage) -> Stream<Item = ServerMessage> method routes a raw ClientMessage to the correct handler. Request-response operations yield exactly one ServerMessage; streaming operations yield many. Errors are mapped to ErrorMsg { code, message } using HTTP status codes with their standard semantics: 400 (bad request), 404 (not found), 500 (internal error).

Client trait

Two required transport primitives:

  • request(ClientMessage) -> Result<ServerMessage> — single round-trip.
  • request_stream(ClientMessage) -> Stream<Item = Result<ServerMessage>> — raw streaming read.

Typed provided methods (send, stream, ping, get_config, set_config) handle message construction, response unwrapping, and sentinel detection. The stream() method consumes events via take_while until StreamEnd and maps each frame through TryFrom<ServerMessage> for type-safe event extraction.

Conversions (message::convert)

From impls wrap typed messages into envelopes (SendMsg -> ClientMessage, SendResponse -> ServerMessage). TryFrom impls unwrap in the other direction, returning an error for unexpected variants. This keeps call sites clean — no manual enum construction.

Alternatives

JSON over WebSocket. Simpler to debug with curl, but larger payloads and no schema enforcement. Protobuf catches schema mismatches at compile time.

gRPC service definitions. Would provide streaming and code generation out of the box, but brings HTTP/2, tower middleware, and tonic as dependencies. The current approach is lighter: raw protobuf frames over a length-prefixed stream, with hand-written trait dispatch.

Separate request/response ID correlation. The protocol is connection-scoped and sequential — one outstanding request per connection at a time. This is a fundamental design constraint: clients must wait for a response before sending the next request. No need for request IDs or multiplexing. If multiplexing is needed later, it belongs in the transport layer, not the protocol.

Unresolved Questions

  • Should the protocol negotiate a version on connect to detect client/server mismatches?
  • Should StreamEnd carry structured error information (code + message) instead of a plain string?
  • Should there be a ClientMessage variant for subscribing to a specific session’s events rather than all events?

0027 - Model

  • Feature Name: Model Abstraction Layer
  • Start Date: 2026-01-25
  • Discussion: #27
  • Crates: model, core

Summary

A provider registry that wraps multiple LLM backends (OpenAI, Anthropic, Google, Bedrock, Azure) behind a unified Model trait, with per-model provider instances, runtime model switching, and retry logic with exponential backoff.

Motivation

The daemon talks to LLMs. Which LLM, from which provider, through which API — that’s configuration, not architecture. The agent code should call model.send() and not care whether it’s hitting Anthropic directly or an OpenAI-compatible proxy.

This requires:

  • A single trait that all providers implement.
  • A registry that maps model names to provider instances.
  • Runtime switching between models without restarting.
  • Retry logic for transient failures (rate limits, timeouts).
  • Type conversion between crabtalk’s message types and each provider’s wire format.

Design

Model trait (core)

Defined in wcore::model:

#![allow(unused)]
fn main() {
pub trait Model: Clone + Send + Sync {
    async fn send(&self, request: &Request) -> Result<Response>;
    fn stream(&self, request: Request) -> impl Stream<Item = Result<StreamChunk>>;
    fn context_limit(&self, model: &str) -> usize;
    fn active_model(&self) -> String;
}
}

The trait is in core because agents are generic over Model. The implementation lives in the model crate.

Provider

Wraps crabllm_provider::Provider (the external multi-backend LLM library) behind the Model trait. Each Provider instance is bound to a specific model name and carries:

  • The backend connection (OpenAI, Anthropic, Google, Bedrock, Azure).
  • A shared HTTP client.
  • Retry config: max_retries (default 2) and timeout (default 30s).

Base URL normalization strips endpoint suffixes (/chat/completions, /messages) so both bare origins and full paths work in config.

ProviderRegistry

Implements Model by routing requests to the correct provider based on the model name in the request.

ProviderRegistry
├── providers: BTreeMap<String, Provider>   # keyed by model name
├── active: String                          # default model
└── client: reqwest::Client                 # shared across providers
  • Construction: one ProviderDef can list multiple model names. Each gets its own Provider instance. Duplicate model names across definitions are rejected at validation time.
  • Routing: send() and stream() look up the provider by request.model. Callers get a clone of the provider — the registry lock is not held during LLM calls.
  • Switching: switch(model) changes the active default. Agents can still override per-request via the model field.
  • Hot add/remove: providers can be added or removed at runtime without rebuilding the registry.

Retry logic

Non-streaming send() retries transient errors (rate limits, timeouts) with exponential backoff and full jitter:

  • Initial backoff: 100ms, doubling each retry.
  • Jitter: random duration in [backoff/2, backoff].
  • Max retries: configurable per provider (default 2).
  • Non-transient errors (auth failures, invalid requests) fail immediately.

Streaming does not retry — the connection is already established.

Type conversion

A convert module translates between wcore::model types (Request, Response, Message, StreamChunk) and crabllm_core types (ChatCompletionRequest, ChatCompletionResponse). This isolates the external library’s types from the rest of the codebase.

Alternatives

Direct provider calls without a registry. Each agent holds its own provider. Rejected because runtime model switching and centralized configuration require a shared registry.

Trait objects instead of enum dispatch. Box<dyn Model> instead of the concrete Provider enum. Rejected because Model has generic return types (impl Stream) that prevent object safety. The enum dispatch via crabllm_provider::Provider handles this naturally.

Unresolved Questions

  • Should the registry support fallback chains (try provider A, fall back to B)?
  • Should streaming requests retry on connection failures before the first chunk?

0036 - Skill Loading

  • Feature Name: Skill Loading
  • Start Date: 2026-03-27
  • Discussion: #36
  • Crates: runtime

Summary

How crabtalk discovers, loads, dispatches, hot-reloads, and scopes skills. The skill format follows the agentskills.io convention — this RFC covers the loading mechanism, not the format.

Motivation

Agents need extensible behavior without recompilation. Skills are the simplest unit that works: a markdown file with a name, description, and a prompt body. No code generation, no plugin API, no runtime linking.

The format is defined by agentskills.io. What crabtalk needs to decide is how skills are found on disk, how they’re resolved at runtime, how they stay current without restarts, and how agents are restricted to subsets of available skills.

Design

Format

SKILL.md follows the agentskills.io convention. Required fields: name, description. Optional: allowed-tools. The markdown body is the skill prompt.

Discovery

SkillHandler::load(dirs) scans a list of directories (in config-defined order) recursively for SKILL.md files. Each skill lives in its own directory:

skills/
  check-feeds/
    SKILL.md
  summarize/
    SKILL.md

Nested organization is supported (skills/category/my-skill/SKILL.md). Hidden directories (.-prefixed) are skipped. Duplicate names across directories are detected and skipped with a warning — first-loaded wins, in config-defined directory order.

Registry

A Vec<Skill> wrapped in Mutex inside SkillHandler. Linear scan — the registry is small enough that indexing is unnecessary. Supports add, upsert (replace by name), contains, and skills (list all).

Dispatch

Exposed as a tool the agent can call. Input: { name: string }.

Resolution order:

  1. Scope check — if the agent has a skill scope and the name is not in it, reject.
  2. Path traversal guard — reject names containing .., /, or \.
  3. Exact load from disk — for each skill directory, check {dir}/{name}/SKILL.md. If found, parse it, upsert into the registry, return the body.
  4. Fuzzy fallback — if no exact match, substring search the registry by name and description. If input is empty, list all available skills (respecting scope).

Hot reload

The upsert on exact load (step 3) is the hot-reload mechanism. When a skill is invoked, it’s always loaded fresh from disk and upserted into the registry. Skills can be updated on disk and picked up on next invocation without daemon restart.

Slash command resolution

Before a message reaches the agent, preprocess resolves leading /skill-name commands. For each skill directory, it checks {dir}/{name}/SKILL.md. If found, the skill body is wrapped in a <skill> tag and injected into the message. This happens before tool dispatch — it’s prompt injection, not a tool call.

Scoping

Agents can be restricted to a subset of skills via AgentScope.skills. If non-empty, only listed skills are available. Empty means unrestricted. Scoping applies to both exact load, fuzzy listing, and slash resolution.

Alternatives

Code-based plugins (dylib / WASM). Far more powerful but far more complex. Skills are prompt injection, not code execution. The simplicity of markdown files is the point.

Database-backed registry. Adds persistence complexity for a registry that rebuilds in milliseconds from disk. Not needed.

Unresolved Questions

  • Should skills support arguments beyond the skill name (parameterized prompts)?
  • Should allowed-tools be enforced at the runtime level? Currently it is not enforced — it exists in the format but has no runtime effect.

0038 - Memory

  • Feature Name: Memory System
  • Start Date: 2026-02-10
  • Discussion: #38
  • Crates: runtime

Summary

File-per-entry memory with BM25-ranked recall, a curated index (MEMORY.md), and an identity file (Crab.md) for agent personality. No database — just files.

Motivation

Agents need persistent knowledge across sessions. The original approach used a graph memory backed by a database, but that added operational weight and complexity for what is fundamentally a collection of text entries that need to be searched.

The system must:

  • Store entries as individual files (inspectable, editable by humans).
  • Search by relevance, not just exact match.
  • Inject relevant memories automatically before each agent turn.
  • Support a curated overview (MEMORY.md) that is always present in context.
  • Support an identity/soul file (Crab.md) for agent personality.

Design

Directory structure

~/.crabtalk/config/
├── Crab.md                  # identity file (one level above memory/)
└── memory/
    ├── entries/
    │   ├── entry-name.md
    │   └── ...
    └── MEMORY.md

Crab.md lives one level above memory/ because it’s an agent-level identity file, not a memory entry. It’s shared across the config, not scoped to memory.

Entry format

Frontmatter markdown. Each entry has a name, description (used for search), and content.

---
name: Entry Name
description: Short searchable description
---

Long-form content here.

Filenames are slugified from the entry name: entry-name.md.

Recall pipeline

BM25 scoring over all entries. The query is matched against the concatenation of description + content. Results are ranked by relevance and capped at recall_limit (configurable).

Auto-recall

Before each agent turn (on_before_run), the system extracts the first 8 words of the last user message (an arbitrary cutoff — short enough to avoid noise, long enough to carry intent), runs recall(), and injects matching results as an auto-injected <recall> block. Auto-injected messages are not persisted and are refreshed every turn.

System prompt injection

  • MEMORY.md — injected as a <memory> block in the system prompt via build_prompt(). Always present if non-empty.
  • Crab.md — the identity file. Injected via build_soul(). Writing is gated by soul_editable config.
  • Memory prompt — instructions for the agent on how to use memory tools, included from prompts/memory.md.

Tools

Four tools exposed to agents:

  • remember(name, description, content) — create or overwrite an entry.
  • forget(name) — delete an entry.
  • recall(query, limit) — BM25 search, returns formatted results.
  • memory(content) — overwrite MEMORY.md index.

Alternatives

Graph memory with database. The original system. Rejected for operational complexity. Files are simpler, inspectable, and sufficient for the use case.

Embedding-based search. Would require a vector store and embedding model. BM25 is fast, dependency-free, and works well enough for the entry sizes we deal with.

Single file storage. One big memory file instead of file-per-entry. Rejected because individual files are easier to inspect, edit, and version.

Unresolved Questions

  • Should auto-recall use more than the first 8 words for the query?
  • Should entries support tags or categories for non-BM25 filtering?

0043 - Component System

  • Feature Name: Component System
  • Start Date: 2026-02-15
  • Discussion: #43
  • Crates: command

Summary

Crabtalk components are independent binaries that install as system services and connect to the daemon via auto-discovery. They crash alone, swap without restarts, and the daemon never loads them. This is the manifesto’s composition model made concrete.

Motivation

The manifesto says: “You put what you need on your PATH. They connect as clients. They crash alone. They swap without restarts.”

This requires a system where components — search, gateways, tool servers — are not subprocesses of the daemon. They’re independent programs that run as system services. The daemon discovers them at runtime. A broken component cannot take the daemon down.

Other projects spawn MCP servers as child processes. If the child hangs or crashes, it can take the daemon with it: zombie processes, leaked file descriptors, blocked event loops. The subprocess model creates shared fate. The component model eliminates it.

Design

The contract

A component is a binary that:

  1. Installs itself as a system service (launchd, systemd, or schtasks).
  2. Writes a port file to ~/.crabtalk/run/{name}.port on startup.
  3. Serves an HTTP API (MCP protocol) on that port.

The daemon scans ~/.crabtalk/run/*.port at startup and discovers components automatically. No configuration needed — drop a component on PATH, install it, and the daemon finds it.

Service trait

#![allow(unused)]
fn main() {
pub trait Service {
    fn name(&self) -> &str;        // "search"
    fn description(&self) -> &str; // human readable
    fn label(&self) -> &str;       // "ai.crabtalk.search"
}
}

The trait provides default start, stop, and logs methods:

  • start — renders a platform-specific service template, installs and launches.
  • stop — uninstalls the service and removes the port file.
  • logs — tails ~/.crabtalk/logs/{name}.log.

MCP service

Components that expose tools to agents extend McpService:

#![allow(unused)]
fn main() {
pub trait McpService: Service {
    fn router(&self) -> axum::Router;
}
}

run_mcp binds a TCP listener on 127.0.0.1:0, writes the port to the run directory, and serves the router. The daemon discovers it on next scan.

Platform support

Service templates are platform-specific:

  • macOS — launchd plist (~/Library/LaunchAgents/)
  • Linux — systemd user unit
  • Windows — schtasks with XML task definition

Auto-discovery

The daemon scans ~/.crabtalk/run/*.port for port files not already connected. Each file contains a port number. The daemon connects via http://127.0.0.1:{port}/mcp. No subprocess management, no shared fate.

Crash? The daemon doesn’t care — it was never the component’s parent process. Restart? New port file, the daemon picks it up on next reload. Update a component? Install the new version, restart the service — the daemon sees the new port on next scan.

Entry point

The run() function handles tracing init and tokio bootstrap for all component binaries.

Alternatives

Subprocess management. The daemon spawns and manages components as child processes. Rejected because shared fate — a broken child can break the daemon. This is the approach we explicitly designed against.

Docker / containerization. Run components in containers. Rejected because crabtalk is local-first. System services are the right abstraction for a personal daemon on your machine.

Shell scripts for service management. Works on Unix, breaks on Windows, drifts across components. A shared Rust crate is portable and stays consistent.

Unresolved Questions

  • Should the Service trait support health checks?
  • Should the daemon watch the run directory for new port files instead of scanning only at startup/reload?

0064 - Session

  • Feature Name: Session System
  • Start Date: 2026-02-25
  • Discussion: #64
  • Crates: core, daemon

Summary

Append-only JSONL session persistence with compact markers, identity-based file naming, and an auto-injected message lifecycle that separates ephemeral context from durable history.

Motivation

An agent daemon needs conversation persistence that is simple, inspectable, and crash-safe. Database-backed persistence adds operational weight for what is fundamentally a sequential log. The session format must support:

  • Resuming conversations across daemon restarts.
  • Compaction — summarizing long histories without losing them.
  • Multiple identities — the same agent can talk to different users/platforms.
  • Ephemeral context injection — memory recall, environment blocks, and agent descriptions must be fresh each run, never accumulating in history.

Design

File format

Each session is a JSONL file. Line 1 is metadata, subsequent lines are messages or compact markers.

{"agent":"crab","created_by":"user","created_at":"...","title":"","uptime_secs":0}
{"role":"user","content":"hello"}
{"role":"assistant","content":"hi there"}
{"compact":"Summary of conversation so far..."}
{"role":"user","content":"what were we talking about?"}

Naming

Files live in a flat sessions/ directory: {agent}_{sender_slug}_{seq}.jsonl

  • sender_slug — sanitized identity (e.g. user, tg-12345).
  • seq — monotonically increasing per (agent, sender) pair.
  • After set_title, the file is renamed to append a title slug.

Compact markers

When history exceeds a threshold, the agent compacts: the LLM summarizes the conversation, and a {"compact":"..."} line is appended. On load, load_context reads from the last compact marker forward. The compact summary is injected as a {"role":"user"} message — the agent sees it as context, not as a special marker.

History before the last compact marker is archived in place — still in the file, but not loaded. Nothing is deleted.

Auto-injected messages

Messages marked auto_injected: true are:

  • Not persisted to JSONL (skipped in append_messages).
  • Stripped before each run (prevents accumulation).
  • Re-injected fresh via Hook::on_before_run() every execution.

This covers memory recall results, environment blocks, agent description lists, and working directory announcements. They must be current, not stale from a previous run.

Session identity

Sessions are bound to an (agent, sender) pair. find_latest_session scans the directory for the matching prefix and returns the highest seq number. New chats increment the seq.

Uptime tracking

Each session tracks uptime_secs — accumulated active time, persisted to the meta line. The meta line is rewritten by reading the full file and writing it back with the updated first line. This is the one non-append operation — it trades the append-only guarantee for keeping metadata current. Crash during rewrite can lose the meta line but not the conversation history (messages are append-only and survive).

Alternatives

SQLite. Adds a dependency and operational surface for what is a sequential append log. JSONL files are inspectable with standard tools and trivially backupable. Appends are crash-safe (partial last line is just a truncated write).

One file per message. Too many files. The append-only JSONL approach gives one file per conversation with clear boundaries.

No compaction. Works for short conversations but becomes expensive as history grows. The compact marker approach keeps the file intact while bounding the working context.

Unresolved Questions

  • Should session files be organized in date-based subdirectories for easier cleanup?
  • Should compact threshold be per-agent configurable or global?

0075 - Hook

  • Feature Name: Hook Lifecycle
  • Start Date: 2026-03-15
  • Discussion: #75
  • Crates: core, runtime, daemon

Summary

The Hook trait is the central extensibility point for agent lifecycle. It defines five methods that the runtime calls at specific points: building an agent, registering tools, preprocessing input, injecting context before a run, and observing events. Everything that customizes agent behavior — skills, memory, MCP, scoping, prompt injection — composes through this trait.

Motivation

When the runtime was split out of the daemon (#75), a clean interface was needed between the runtime (which executes agents) and the hook implementations (which customize them). The runtime must not know about skills, memory, MCP, or daemon infrastructure. It only knows it has a Hook and calls its methods at the right times.

This separation enables two modes: the daemon (full hook with skills, MCP, memory, event broadcasting) and embedded use (no hook, or a minimal one).

Design

The trait

#![allow(unused)]
fn main() {
pub trait Hook: Send + Sync {
    fn on_build_agent(&self, config: AgentConfig) -> AgentConfig;
    fn on_register_tools(&self, tools: &mut ToolRegistry) -> impl Future<Output = ()>;
    fn preprocess(&self, agent: &str, content: &str) -> String;
    fn on_before_run(&self, agent: &str, session_id: u64, history: &[Message]) -> Vec<Message>;
    fn on_event(&self, agent: &str, session_id: u64, event: &AgentEvent);
}
}

All methods have default no-op implementations. () implements Hook.

Lifecycle points

on_build_agent — called when an agent is registered with the runtime. Receives the agent config, returns a modified config. This is where the system prompt is composed. The RuntimeHook implementation chains:

  1. Environment block (OS, shell, platform).
  2. Memory prompt (MEMORY.md content as <memory> block).
  3. Resource hints (available MCP servers, available skills).
  4. Scope block (if agent has restricted skills/MCPs/members, appends a <scope> XML block listing allowed resources).
  5. Tool whitelist computation (restricts config.tools based on scope).

on_register_tools — called at runtime startup. Registers tool schemas (name, description, JSON schema) into the ToolRegistry. No handlers — dispatch is separate. RuntimeHook registers: OS tools, skill tool, task/delegate tool, ask_user tool, memory tools (if enabled), and MCP-discovered tools.

preprocess — called before a user message enters the conversation. Used for slash command resolution: /skill-name args is transformed into the skill body wrapped in a <skill> tag. Happens before tool dispatch.

on_before_run — called before each agent execution (send/stream). Returns messages to inject into the conversation. RuntimeHook injects:

  1. Agent descriptions (if the agent has delegation members).
  2. Memory auto-recall (BM25 search on last user message, as <recall> block).
  3. Working directory announcement (as <environment> block).

All injected messages are marked auto_injected: true — they’re ephemeral, not persisted, stripped before each run, and refreshed.

on_event — called after each agent step. Receives every AgentEvent (text deltas, tool calls, completions). DaemonBridge uses this to broadcast events to console subscribers.

Composition

RuntimeHook<B: RuntimeBridge> is the engine hook. It composes SkillHandler, McpHandler, Memory, and AgentScope maps. It implements Hook by orchestrating all subsystems.

DaemonHook is a type alias: RuntimeHook<DaemonBridge>. The daemon bridge adds ask_user dispatch, delegate dispatch, session CWD, and event broadcasting.

For embedded use, RuntimeHook<NoBridge> provides the full engine without daemon infrastructure.

Tool dispatch

RuntimeHook::dispatch_tool is the central routing table — a match on tool name. It’s not part of the Hook trait itself (the trait only registers schemas). The runtime calls dispatch_tool when an agent produces a tool call. Dispatch enforces scoping before routing.

Alternatives

Separate traits per concern. One trait for prompt building, one for tools, one for events. Rejected because they always compose together and the single trait is simpler to implement and reason about.

Closure-based hooks. Pass lambdas instead of a trait. Rejected because the hook needs shared state (skill registry, MCP connections, memory) that closures make awkward.

Unresolved Questions

  • Should on_build_agent be async to support hooks that need I/O during agent construction?
  • Should preprocess support returning multiple messages (e.g. for multi-skill invocation)?

0078 - Compact Session

  • Feature Name: Compact Session Interface
  • Start Date: 2026-03-25
  • Discussion: #78
  • Crates: core, daemon

Summary

Expose session compaction as a protocol operation so clients can request a concise context summary on demand, enabling cross-agent context handoff with custom @-mention logic.

Motivation

When a user @-mentions a different agent mid-conversation, the client needs to hand off context. The naive approaches don’t work:

  • Raw history includes irrelevant tool results, thinking tokens, and the previous agent’s system prompt — expensive and noisy.
  • No context means the target agent flies blind.

Compact produces a focused briefing: the LLM summarizes the conversation into essential context. The target agent gets its own system prompt (warm in token cache) plus the compact summary plus the user’s query — high quality context, minimal tokens.

The key insight: this belongs in the protocol, not the client. The daemon already has the session history and the LLM connection. The client just needs to say “compact session N” and get a summary back. But the mention logic itself stays in the client — the daemon doesn’t know about @-mentions, UI conventions, or which agent to route to. The client decides when and why to compact; the daemon does the summarization.

Design

A Compact message is added to the protobuf protocol:

  • Request: CompactRequest { session: u64 } — client asks the daemon to compact a specific session.
  • Response: CompactResponse { summary: string } — the daemon returns the summarized context.

The Server trait gains a compact_session method. The daemon implementation delegates to Agent::compact(), which sends the session history to the LLM with a compaction prompt that preserves identity and profile information.

What the daemon does

  • Accepts the compact request via the protocol.
  • Loads the session history.
  • Calls the agent’s compact method (LLM summarization).
  • Returns the summary string.

What the client does

  • Detects @-mentions (its own UI logic).
  • Requests compact of the current session.
  • Creates or selects the target agent’s session.
  • Sends the compact summary + user query to the target agent.

Context selection alternatives

If compact is too slow for the use case:

  • BM25 — already in the codebase for memory recall. Keyword-match messages against the query.
  • Last N messages — simplest. Often sufficient for short conversations.

These are client-side decisions. The compact interface doesn’t preclude them.

Alternatives

Client-side compaction. The client could do its own summarization, but it would need LLM access and session history — duplicating what the daemon already has.

Automatic compaction on mention. The daemon could detect @-mentions and compact automatically. Rejected because mention syntax is a client concern — different clients have different conventions.

Unresolved Questions

  • Should compact accept parameters (max tokens, focus query) to guide summarization?
  • Should the daemon cache compact results for repeated handoffs within the same conversation?

0080 - Cron

  • Feature Name: Daemon-Level Cron Scheduler
  • Start Date: 2026-03-20
  • Discussion: #80
  • Crates: daemon

Summary

A daemon-level cron system that triggers skills into sessions on a schedule, replacing the previous per-agent heartbeat mechanism.

Motivation

Agents need periodic behavior — checking feeds, running maintenance, sending reminders. The original approach was a per-agent heartbeat config, but this was dead code and wrong-shaped: heartbeats are uniform intervals, while scheduled tasks need cron-style flexibility (every Monday at 9am, every 2 hours, etc.).

The session already carries the agent and sender. A cron entry only needs to know which skill to fire and which session to fire it into.

Design

A cron entry triggers a skill into a session on a schedule.

Data model

[[cron]]
id = 1
schedule = "0 */2 * * *"
skill = "check-feeds"
session = 12345
quiet_start = "23:00"
quiet_end = "07:00"
once = false
  • id — auto-incremented on create.
  • schedule — standard cron expression, validated on create and load.
  • skill — fired as /{skill} slash command into the session.
  • session — target session ID. The session determines the agent.
  • quiet_start/quiet_end — optional HH:MM window in the daemon’s local time. If fire time falls inside, skip silently. No queuing, no catch-up. Both must be set; if only one is provided, quiet hours are ignored.
  • once — fire once then delete from memory and disk.

Persistence

Memory is authoritative at runtime. Disk (crons.toml) is recovery for restarts.

  • Startup: load from disk, start timers. Invalid schedules are skipped with a warning.
  • Create/Delete: mutate memory, start/stop timer, atomic write to disk (tmp + rename).
  • Runtime reload: crons stay in memory — they survive runtime swaps.
  • Daemon restart: reload from disk.

Firing

Fire-and-forget via the daemon event channel. The cron sends a ClientMessage with content /{skill} and sender "cron". The reply channel is dropped — output goes to session history only.

Protocol

Three protocol operations on the Server trait:

  • CreateCron { schedule, skill, session, quiet_start?, quiet_end? }CronInfo
  • DeleteCron { id } → success/not found
  • ListCronsCronList

Crons are process-lifetime, not session-lifetime. They survive runtime reloads, fire via the daemon event channel, and the runtime has no notion of time-based scheduling. This is a daemon concern.

Alternatives

Per-agent heartbeat config. The original approach. Rejected because it coupled scheduling to agent definition, couldn’t express cron-style schedules, and was dead code.

Client-side polling. A client can send messages on its own timer. This works but requires the client to be running. Daemon crons fire regardless of client state.

Unresolved Questions

  • Should crons support arguments beyond the skill name?
  • Should there be a max cron count to prevent resource exhaustion?

0082 - Scoping

  • Feature Name: Agent Scoping
  • Start Date: 2026-03-22
  • Discussion: #82
  • Crates: runtime, core

Summary

A whitelist-based scoping system that restricts what an agent can access: tools, skills, MCP servers, and delegation targets. Enforced at dispatch time and advertised in the system prompt. This is a security boundary, not a hint.

Motivation

In multi-agent setups, a delegated sub-agent should not have the same capabilities as the primary agent. A research agent doesn’t need bash. A summarizer doesn’t need to delegate to other agents. Without scoping, every agent has access to everything — which means a misbehaving or confused agent can call tools it was never intended to use.

Scoping solves this by letting agent configs declare exactly what resources are available. The runtime enforces it.

Design

AgentScope

#![allow(unused)]
fn main() {
pub struct AgentScope {
    pub tools: Vec<String>,     // empty = unrestricted
    pub members: Vec<String>,   // empty = no delegation
    pub skills: Vec<String>,    // empty = all skills
    pub mcps: Vec<String>,      // empty = all MCP servers
}
}

Empty list means unrestricted. Non-empty means only listed items are allowed. This is an inclusive whitelist, not a denylist.

Whitelist computation

When an agent has any scoping (non-empty skills, mcps, or members), the runtime computes a tool whitelist during on_build_agent:

  1. Start with BASE_TOOLS: bash, ask_user — always available.
  2. If memory is enabled: add recall, remember, memory, forget.
  3. If skills list is non-empty: add skill tool.
  4. If mcps list is non-empty: add mcp tool.
  5. If members list is non-empty: add delegate tool.

The computed whitelist replaces config.tools. Tools not on the list are invisible to the agent.

Prompt injection

A <scope> block is appended to the system prompt listing the agent’s allowed resources:

<scope>
skills: check-feeds, summarize
mcp servers: search
members: researcher, writer
</scope>

This tells the agent what it can use. The agent doesn’t need to guess or discover — its boundaries are stated upfront.

Enforcement

Scoping is enforced at four dispatch points:

  • dispatch_tool — rejects tool calls not in the agent’s tool whitelist.
  • dispatch_skill — rejects skill names not in the agent’s skill list.
  • dispatch_mcp — filters MCP server list to allowed servers.
  • dispatch_delegate — rejects delegation to agents not in the members list.

Enforcement happens at runtime, not just at prompt time. Even if the LLM ignores the <scope> block and tries to call a restricted tool, the dispatch layer rejects it.

Default agent

The default agent (primary) has no scope restrictions — empty lists on all four dimensions. Scoping is for sub-agents that need constrained access.

Alternatives

Denylist instead of whitelist. List what’s forbidden instead of what’s allowed. Rejected because allowlists are safer by default — a new tool or server is inaccessible until explicitly granted. Denylists require updating every time a new resource is added.

Prompt-only scoping. Tell the agent its restrictions in the prompt but don’t enforce at dispatch. Rejected because LLMs don’t reliably follow instructions — a determined or confused model will call tools it was told not to. Enforcement must be at the dispatch layer.

Unresolved Questions

  • Should scoping support wildcard patterns (e.g. mcp: search-*)?
  • Should scope violations be logged as security events for monitoring?