In Seshat, memory is not one subsystem. The runtime keeps a durable session record, compresses the active context window, learns reusable memory across sessions, exposes explicit long-term memory tools, and can also retrieve from indexed corpora with RAG. Those pieces work together, but they are not interchangeable.
Session persistence answers what happened. Compaction answers what still needs to stay in the current prompt. Durable memory answers what should still influence future sessions. RAG answers what external corpus should be retrieved on demand.
The full transcript, metadata, tool results, and resumability live in SQLite.
Only the provider-facing slice stays hot in the current turn. This is what compaction protects.
Project instructions, user preferences, tool patterns, and recent session summaries can be re-injected later.
Indexed chunks live outside the transcript and are pulled only when retrieval is needed.
The runtime uses all four layers, but it does not collapse them into one bag of text. Session state, compacted context, learned memory, and retrieval each solve a different problem.
The prompt builder receives a compact working window plus durable memory context.
Explicit tool calls can read or update longer-lived knowledge without stuffing it all into the conversation.
That separation matters because each layer has different latency, size, persistence, and retrieval behavior. If you mix them conceptually, the architecture becomes fuzzy very quickly. The code does not treat them as the same thing, and the docs should not either.
The first layer is the session store. Seshat persists multi-turn sessions so they can survive restarts, be resumed later, and remain inspectable. This is the source of truth for full transcript history. It is not constrained by the provider context window in the same way the active prompt is.
seshat sessions list
seshat chat --resume <session-id>
seshat chat --continueThis is the first distinction many agent systems blur. The model only sees the active request window. The runtime still owns a larger durable session record containing messages, metadata, tool results, and resumability information.
The runtime computes current request tokens against the model window.
Messages are normalized and shortened before a summary model is used.
A compact summary preserves active requests, decisions, tools, and unresolved work.
The active window shrinks, but the durable session record stays intact.
The compaction engine in internal/runtime/memory estimates request tokens, computes an effective context window after reserving summary output budget, and triggers auto-compaction once the configured threshold is crossed. It first tries micro-compaction, then escalates to summary compaction if the context is still too large.
The key point is architectural, not cosmetic: compaction reduces the provider-facing working set, but it does not mean the earlier session history disappeared from the runtime.
Separate from the transcript and compactor, the memory manager keeps scoped memory under <runtime-root>/memory. Its job is to surface stable context that should keep influencing future turns or future sessions.
Instructions and preferences tied to one codebase or working directory.
Response language, format, tone, or other stable user-level habits.
Patterns extracted from successful or failed tool execution over time.
Short summaries of past sessions for the same project, kept outside the live prompt loop.
The runtime also learns some of this automatically. User messages can be scanned for durable directives, tool usage can update pattern memory, and finished sessions can generate summaries that later re-enter the prompt as compact context. In catalog mode, the integration layer can also search memory entries instead of only formatting them into prompt text.
There is another persistent memory path in the runtime: the explicit memory_* tool family. These tools do not just append notes to the prompt. They operate on a long-term knowledge graph with entities, observations, search, and node opening.
memory_create_entities
memory_add_observations
memory_search_nodes
memory_open_nodes
rag_ingest
rag_searchIn the CLI runtime, this graph is wired through LongTermMemory and backed by the same SQLite database path when available. This means Seshat currently has two durable-memory shapes: prompt-facing scoped memory for reuse in future prompts, and tool-facing graph memory for explicit structured recall.
A document enters a named corpus and optionally gets an artifact reference.
The default chunker splits by paragraph and hard-caps chunks around 800 characters.
Each chunk is embedded through an OpenAI-compatible or Ollama embedding endpoint.
Vectors are upserted into the embedded HNSW store under a corpus namespace.
Query embeddings retrieve candidates, hybrid weighting can blend keywords, and reranking is optional.
The RAG service in internal/rag ingests text into a named corpus, chunks it, embeds each chunk, and upserts the records into a vector store. At query time it embeds the question, searches the corpus namespace, optionally blends keyword scoring through HybridWeight, and can apply a reranker before returning the final results.
export RAG_EMBEDDING_URL=http://localhost:11434
export RAG_EMBEDDING_MODEL=nomic-embed-text
export RAG_EMBEDDING_PROVIDER=ollama
seshat chatThe SDK composition boundary makes the architecture obvious. Memory, RAG, and long-term graph memory are not one field or one switch. They are separate capabilities wired into the client and then exposed through the runtime.
client, err := sdk.NewClient(&sdk.ClientConfig{
EnableMemory: true,
RAGService: ragSvc,
LongTermMemory: graphStore,
})Good systems usually need both. Memory helps the runtime stay coherent over time. RAG helps it pull in source material that is too large or too external to live inside that memory permanently.
Session persistence, auto-compaction, scoped learned memory, HNSW-backed RAG, and explicit long-term memory tools all exist in the current codebase.
The retrieval stack can still grow with richer ingestion paths, better ranking, more memory policies, and tighter product-layer knowledge experiences in seshat-ai.
Once memory boundaries are clear, the next important layer is capability distribution: how tools, skills, and MCP servers extend the runtime without breaking its core contract.