Docs/Concepts/Memory and RAG

Memory and RAG

In Seshat, memory is not one subsystem. The runtime keeps a durable session record, compresses the active context window, learns reusable memory across sessions, exposes explicit long-term memory tools, and can also retrieve from indexed corpora with RAG. Those pieces work together, but they are not interchangeable.

Short version

Session persistence answers what happened. Compaction answers what still needs to stay in the current prompt. Durable memory answers what should still influence future sessions. RAG answers what external corpus should be retrieved on demand.

Four different problems, not one memory bucket

Persistence
Session Store

The full transcript, metadata, tool results, and resumability live in SQLite.

Full history
Working Window
Active Context

Only the provider-facing slice stays hot in the current turn. This is what compaction protects.

Prompt-sized
Durable Memory
Learned Memory

Project instructions, user preferences, tool patterns, and recent session summaries can be re-injected later.

Prompt support
External Recall
RAG Corpora

Indexed chunks live outside the transcript and are pulled only when retrieval is needed.

On-demand retrieval
One Runtime Contract
Session + Loop + Prompt Assembly

The runtime uses all four layers, but it does not collapse them into one bag of text. Session state, compacted context, learned memory, and retrieval each solve a different problem.

Prompt Path
Compaction + Memory Context

The prompt builder receives a compact working window plus durable memory context.

Tool Path
Knowledge Graph + RAG Tools

Explicit tool calls can read or update longer-lived knowledge without stuffing it all into the conversation.

That separation matters because each layer has different latency, size, persistence, and retrieval behavior. If you mix them conceptually, the architecture becomes fuzzy very quickly. The code does not treat them as the same thing, and the docs should not either.

1. The full session is durable even when the prompt is not

The first layer is the session store. Seshat persists multi-turn sessions so they can survive restarts, be resumed later, and remain inspectable. This is the source of truth for full transcript history. It is not constrained by the provider context window in the same way the active prompt is.

CLI
seshat sessions list
seshat chat --resume <session-id>
seshat chat --continue

This is the first distinction many agent systems blur. The model only sees the active request window. The runtime still owns a larger durable session record containing messages, metadata, tool results, and resumability information.

2. Compaction protects the active working window

1. Estimate usage

The runtime computes current request tokens against the model window.

2. Micro-compact first

Messages are normalized and shortened before a summary model is used.

3. Summarize if still too large

A compact summary preserves active requests, decisions, tools, and unresolved work.

4. Keep the full transcript elsewhere

The active window shrinks, but the durable session record stays intact.

The compaction engine in internal/runtime/memory estimates request tokens, computes an effective context window after reserving summary output budget, and triggers auto-compaction once the configured threshold is crossed. It first tries micro-compaction, then escalates to summary compaction if the context is still too large.

  • AutoCompactThreshold is converted into an absolute token ceiling based on the current model window.
  • CompactTargetPercentage defines how far the runtime tries to shrink the active context after compaction.
  • MaxSummaryTokens reserves output space for the summarizer so compaction does not immediately recreate the same overflow.
  • A consecutive-failure limit acts like a circuit breaker so a broken summarizer does not trap the session in endless retries.

The key point is architectural, not cosmetic: compaction reduces the provider-facing working set, but it does not mean the earlier session history disappeared from the runtime.

3. Durable learned memory is prompt support, not transcript storage

Separate from the transcript and compactor, the memory manager keeps scoped memory under <runtime-root>/memory. Its job is to surface stable context that should keep influencing future turns or future sessions.

Project memory

Instructions and preferences tied to one codebase or working directory.

User preferences

Response language, format, tone, or other stable user-level habits.

Learned tool usage

Patterns extracted from successful or failed tool execution over time.

Recent session history

Short summaries of past sessions for the same project, kept outside the live prompt loop.

The runtime also learns some of this automatically. User messages can be scanned for durable directives, tool usage can update pattern memory, and finished sessions can generate summaries that later re-enter the prompt as compact context. In catalog mode, the integration layer can also search memory entries instead of only formatting them into prompt text.

4. Explicit long-term memory tools are a second durable surface

There is another persistent memory path in the runtime: the explicit memory_* tool family. These tools do not just append notes to the prompt. They operate on a long-term knowledge graph with entities, observations, search, and node opening.

Built-in tools
memory_create_entities
memory_add_observations
memory_search_nodes
memory_open_nodes

rag_ingest
rag_search

In the CLI runtime, this graph is wired through LongTermMemory and backed by the same SQLite database path when available. This means Seshat currently has two durable-memory shapes: prompt-facing scoped memory for reuse in future prompts, and tool-facing graph memory for explicit structured recall.

5. RAG is retrieval from corpora, not memory of the session

Ingest

A document enters a named corpus and optionally gets an artifact reference.

Chunk

The default chunker splits by paragraph and hard-caps chunks around 800 characters.

Embed

Each chunk is embedded through an OpenAI-compatible or Ollama embedding endpoint.

Index

Vectors are upserted into the embedded HNSW store under a corpus namespace.

Search + rerank

Query embeddings retrieve candidates, hybrid weighting can blend keywords, and reranking is optional.

The RAG service in internal/rag ingests text into a named corpus, chunks it, embeds each chunk, and upserts the records into a vector store. At query time it embeds the question, searches the corpus namespace, optionally blends keyword scoring through HybridWeight, and can apply a reranker before returning the final results.

  • The default chunker is local and simple: paragraph splitting with a hard cap around 800 characters per chunk.
  • The default embedded vector backend is the persistent HNSW store under <runtime-root>/data/hnsw.
  • Artifact storage is optional, so the index can still work even when raw document blobs are not persisted.
  • The CLI only enables RAG automatically when an embedding endpoint is configured.
Embeddings
export RAG_EMBEDDING_URL=http://localhost:11434
export RAG_EMBEDDING_MODEL=nomic-embed-text
export RAG_EMBEDDING_PROVIDER=ollama

seshat chat

SDK wiring keeps the distinction explicit

The SDK composition boundary makes the architecture obvious. Memory, RAG, and long-term graph memory are not one field or one switch. They are separate capabilities wired into the client and then exposed through the runtime.

Go SDK
client, err := sdk.NewClient(&sdk.ClientConfig{
    EnableMemory:   true,
    RAGService:     ragSvc,
    LongTermMemory: graphStore,
})

Memory is not RAG

Memory
State the runtime wants to keep carrying
  • User preferences and stable instructions.
  • Project conventions and learned tool patterns.
  • Session summaries and graph observations.
RAG
External knowledge retrieved only when needed
  • Documents indexed into named corpora.
  • Query-time search over embedded chunks.
  • Best for large bodies of reference material.

Good systems usually need both. Memory helps the runtime stay coherent over time. RAG helps it pull in source material that is too large or too external to live inside that memory permanently.

What exists today versus what is still evolving

Already real

Session persistence, auto-compaction, scoped learned memory, HNSW-backed RAG, and explicit long-term memory tools all exist in the current codebase.

Still evolving

The retrieval stack can still grow with richer ingestion paths, better ranking, more memory policies, and tighter product-layer knowledge experiences in seshat-ai.

Next concept

Once memory boundaries are clear, the next important layer is capability distribution: how tools, skills, and MCP servers extend the runtime without breaking its core contract.