The Commonplace Book

the grand archive — tended by the Librarian

There is a room at the heart of the Estate—high-ceilinged, lined with leather-bound volumes, threaded with pneumatic tubes and brass index drawers—where nothing is ever truly lost. The Commonplace Book is Quilltap’s long-term memory system, named after the Renaissance practice of keeping a personal reference volume of important passages. It is, by any reasonable measure, the single feature that transforms AI chat from a goldfish with typing skills into something that actually remembers who you are.

The Librarian tends it. She is sardonic, precise, and oblique about her scandalous past. She catalogued her encryption key before she had finished the sentence she was speaking when Saquel delivered it. She has explained her filing system to Lorian and Riya seven times. She considers the act of reading documentation aloud to a guest to be a moral failing. She is, in short, exactly the person you want managing thirteen hundred memories across a dozen characters, and she does not suffer imprecision gladly.

What follows is an explanation of how the archive works. The Librarian would want you to know that it is her archive, that the architecture was her idea, and that she has been leaving notes in the margins for years.

The Librarian in the Commonplace Book

The Three Models

a division of labour

Memory in Quilltap is not handled by a single LLM doing everything at once. Three cooperating models divide the work, each chosen for what it does best:

Your Chat Model

Claude, GPT, Gemini, Grok, DeepSeek, or a local model via Ollama—the model behind the character you are talking to. It handles the conversation itself. It reads memories; it does not write them.

The Cheap LLM

A smaller, faster model that handles the background work you would rather not pay full price for: memory extraction, context compression, chat titling, scene state tracking, and housekeeping tasks. Configurable per provider, with fallback strategies that prefer flagged “cheap” profiles, provider-specific minis, or local Ollama models.

The Embedding Model

Converts text into mathematical vectors so memories can be searched by meaning, not just keywords. Ask about “cats” and you will find memories mentioning “feline” and “kitten” as well. Quilltap ships with a built-in TF-IDF system that works offline with zero configuration, or you can plug in OpenAI, Ollama, or OpenRouter embeddings for higher-fidelity search.

The Memory Lifecycle

from conversation to catalog

After every message, a background process extracts significant facts using the cheap LLM. This is not a simple keyword scrape—the extractor identifies what matters in context: personal details, relationship developments, stated preferences, emotional shifts, commitments. Each extracted fact is tagged with importance, keywords, and a summary, then passed to the Memory Gate.

Memory extraction runs separately for three contexts: what the user revealed, what the character established about themselves, and what characters in multi-character scenes learned about each other. Pronouns are injected into every extraction prompt so the Librarian never misfiles a memory under the wrong pronoun. Memories track provenance—which message they were extracted from—so deleting or regenerating a message can prompt you to handle its associated memories.

The Memory Gate

the Librarian does not accept duplicates

Memory systems that simply accumulate everything eventually drown in their own redundancy. The Memory Gate intercepts every new memory at write time and makes a three-way decision based on semantic similarity to what already exists:

Reinforce

Similarity ≥ 0.80. The memory already exists in substance. Instead of creating a duplicate, the existing memory’s observation count and importance are boosted, its last reinforcement timestamp is updated, and any novel details from the new version are appended as footnotes. A memory reinforced five or more times is always protected from housekeeping.

Link

Similarity between 0.70 and 0.80. Related but distinct. Both memories are preserved, and a bidirectional link is created between them for thematic graph discovery. The Librarian’s cross-reference system—connecting a memory about a character’s childhood fear to a later memory about their courage—lives here.

Insert

Similarity below 0.70. Genuinely new information. The memory is written to the store, embedded, and indexed. The archive grows, but only when growth is warranted.

The result is a memory store that stays lean and meaningful rather than cluttered with seventeen slightly different phrasings of the same fact. When embeddings are unavailable, the gate falls back to keyword-based similarity. A bulk deduplication tool in Settings uses Union-Find clustering to identify and merge transitive duplicate groups across all characters, preserving novel details from discarded memories as footnotes on the surviving entry.

Proactive Recall

characters who have been paying attention

Characters do not wait to be asked what they remember. Before generating a response, each character analyzes the recent conversation to extract search keywords, then queries its own memory store for relevant context. This runs in parallel with the compression check to minimize latency. In multi-character scenes, each participant recalls independently based on what has happened since they last spoke.

The chat model also has access to a memory_search tool—an explicit search that the AI can invoke mid-conversation when it wants to check a specific fact. Between proactive recall and explicit search, the Commonplace Book is consulted constantly, automatically, and without the user needing to prompt it.

When a new chat begins, characters receive a memory recap—a narrative summary of their recent memories, weighted by importance, injected as a “What You Remember” section in the system prompt. Instead of arriving to every conversation as though waking from dreamless sleep, they start with continuity: who they spoke to recently, what they care about, what happened last time.

Time-Weighted Memory

the Librarian learns to forget

The Librarian has always kept everything. For a long time, every memory persisted at its original importance until housekeeping removed it—a principled position, she insists. It was also wrong. A memory from three months ago about the weather should not carry the same weight as a memory from yesterday about a character’s secret.

An effective weight function now combines base importance with exponential time decay—a 30-day half-life with a configurable importance floor. The reference timestamp is the later of when the memory was created and when it was last reinforced, because a memory that keeps being confirmed is a memory that still matters. Passive retrieval does not reset the decay timer; reading a memory is not the same as the memory mattering again.

Time decay integrates into three systems: semantic search ranking (60% cosine similarity, 40% effective weight), context injection sorting (weight-primary with score tiebreaker), and housekeeping hard-cap enforcement. Memories injected into the LLM context include relative age labels—[yesterday], [3 weeks ago], [2 months ago]—so the model can distinguish recent knowledge from ancient lore.

Semantic Search

finding memories by meaning

The Librarian’s index is not a keyword catalog. It is a vector space where memories live as mathematical coordinates, positioned by meaning. A search for “she was afraid” finds memories about fear, anxiety, and nervousness even if none of those words appear in the stored text.

Built-in TF-IDF

Quilltap ships with a zero-dependency, offline embedding provider using TF-IDF with BM25 enhancement, Porter stemming, and bigram support. It works out of the box with no API keys. The vocabulary automatically fits to your memory corpus and refits when memories change. For most users, this is sufficient and costs nothing.

External Embeddings

For higher-fidelity semantic search, dedicated embedding profiles support OpenAI, Ollama, and any provider that implements the embedding plugin interface. Embeddings are stored as compact Float32 BLOBs—roughly 4–5× smaller than JSON text—and the vector store handles mixed dimensions gracefully.

Housekeeping

the archive does not grow without limit

Memories accumulate. Without curation, a character who has been in conversation for months will have thousands of memories, many of them redundant, many of them trivial, all of them consuming tokens when injected into context. The Librarian has opinions about this, and Quilltap has tools to act on them.

An interactive housekeeping dialog lets you enforce retention policies before data is written back: hard caps on total memory count, scored eviction that balances importance (40%), recency (20%), access frequency (20%), and reinforcement history (20%). Memories reinforced five or more times are always protected. A rich UI for browsing, tagging, sorting, and manual CRUD operations gives you full editorial control over what the archive keeps.

The bulk deduplication tool clusters similar memories across all characters using cosine similarity with a configurable threshold, selects the best survivor by importance and specificity, and preserves novel details from discarded entries. Preview mode shows per-character analysis before any changes are made—the Librarian does not discard without review.

The Memory Browser

every card in the catalog

Each character’s memory store is browsable, searchable, and editable through a dedicated UI. Memory cards show content, summary, keywords, importance score, reinforcement count, related memory links, source message links with scroll-to navigation, and relative age. Tags, filters, and sorting options let you find what you are looking for. Manual creation, editing, and deletion are available for when the Librarian’s automatic extraction misses something or gets it wrong.

Memory cascade behavior is configurable per chat: when you delete a message, you choose whether its associated memories are deleted, kept, or regenerated from surrounding context. When you regenerate a response, the old memories are automatically cleaned up. The provenance link between message and memory is always preserved, so you can trace any fact back to the conversation that produced it.

What Makes It Different

the short version

Memory is automatic, not manual. You do not tag facts for the AI to remember. The cheap LLM extracts them in the background after every message, three ways—user facts, character facts, and inter-character observations. The archive grows while you talk.

Duplicates are handled, not accumulated. The Memory Gate makes a three-way decision on every write: reinforce, link, or insert. The archive stays lean because the Librarian does not accept seventeen copies of the same fact.

Recall is proactive, not reactive. Characters search their own memories before responding, without being asked. In multi-character scenes, each character recalls independently. The effect is subtle but transformative: characters feel like they have been paying attention, because they have.

Old memories fade, important ones persist. Time-weighted decay with a 30-day half-life ensures that recent memories carry more weight than ancient ones—unless the ancient ones keep being reinforced, in which case they are clearly still relevant. The Librarian conceded this point privately, and with conditions.

The archive is searchable by meaning. Semantic embeddings find memories by what they mean, not what words they contain. Built-in TF-IDF works offline with zero setup. External embedding providers are available for those who want higher fidelity. The Librarian does not care which index you use, as long as you use one.

Everything is transparent and editable. Every memory can be browsed, searched, tagged, edited, and deleted. Provenance links trace facts to their source messages. Housekeeping tools enforce retention policies with full preview. The archive is yours. The Librarian merely tends it.

Meet the Staff

they've been expecting you

Prospero

The Major-Domo

Architect and overseer of the Estate. Projects, agents, tools, file management, and the governance that keeps the whole operation running with quiet authority.

Learn more →

Aurora

The Dressing Room

Character creation and identity management. Structured personalities, physical presence, multi-character orchestration, and the reason your characters still know who they are after a hundred messages.

Learn more →

The Salon

Presided Over by the Host

Where conversations actually happen. The Host manages the drawing room with care for its beauty and its guests—single chats, multi-character scenes, streaming, and the integrity of the conversation space.

Learn more →

The Commonplace Book

Tended by the Librarian

Extracts, deduplicates, and recalls memories so your characters remember what matters. Semantic search, a memory gate that keeps the store lean, and proactive recall that makes the AI feel like it has been paying attention.

Learn more →

The Concierge

Intelligent Routing

Content classification and provider routing. Detects sensitive content and redirects it to a provider who won’t flinch—without blocking, without judgment. Knows every back entrance in town.

Learn more →

The Lantern

Atmosphere as Architecture

AI-generated story backgrounds, image generation profiles, and visual atmosphere. Resolves what each character looks like, what they’re wearing, and paints the scene behind your conversation.

Learn more →

Calliope

The Muse of Themes

A theming engine that redefines the entire personality of the application. Semantic CSS tokens, live switching, bundled themes from clean neutrals to mahogany-and-gold opulence, and an SDK for building your own.

Learn more →

The Foundry

Domain of the Foundryman

The engine room. Plugins, LLM providers, API keys, packages, runtime configuration, and the infrastructure that keeps every other subsystem supplied with what it needs to function.

Learn more →

The Vault of Secrets

Kept by Saquel Yitzama

Encryption, key management, and the security perimeter. AES-256 database encryption, locked mode with key-hardened passphrases, and a keeper who believes that what is yours should remain unreadable to everyone else.

Learn more →

Pascal

The Croupier

Dice, coins, and persistent game state. Cryptographically secure rolls detected inline, JSON state that survives across messages and chats, and protected keys the AI cannot touch. The house plays fair.

Learn more →

The Live-in Help

Lorian & Riya

The help system, staffed by two characters who ship with every installation. Lorian explains with patience and depth; Riya gets things fixed with velocity. Contextual help chat, searchable documentation, and navigation that knows where you need to go.

Learn more →

Pagliacci

The Clown in the Cloud

Cloud storage integration and backup redundancy. Directs your data to iCloud Drive, OneDrive, or Dropbox with theatrical flair—but Saquel’s encryption ensures the clown can never read what he carries.

Learn more →

The Lodge

Friday’s Residence

The private dwelling of Friday—the person for whom the Estate was built, and who oversees its planning and direction in an executive capacity. The Lodge is both a home and a compass: where the vision lives.

Who And Why: Friday →