How It Works
Most AI applications are, to put it charitably, goldfish with typing skills. They forget everything the moment you close the tab, run every conversation through someone else's servers, and present you with a personality roughly as distinctive as a hotel room. Quilltap takes a rather different approach.
Here is how the machinery works — not a feature list, but an explanation of the engineering decisions that make Quilltap behave the way it does, and why we made them.
The Commonplace Book — Memory That Earns Its Keep
When you tell your character that you grew up in Edinburgh, Quilltap doesn't merely nod and move on. A background process — what we call the "cheap LLM" — quietly extracts that fact, generates a semantic embedding for it, and files it in a per-character memory store. The next time you mention Scotland in passing, Quilltap doesn't need you to repeat yourself. It already knows.
This is the Commonplace Book: Quilltap's long-term memory and retrieval engine, named after the Renaissance practice of keeping a personal reference volume of important passages. It works through three cooperating models:
- Your primary chat model — Claude, GPT, Gemini, Grok, or a local model via Ollama — handles the conversation itself.
- A "cheap" background model — a smaller, faster LLM that handles memory extraction, context compression, chat titling, and housekeeping tasks you'd rather not pay full price for.
- An embedding model — converts text into mathematical vectors so memories can be searched by meaning, not just keywords. Ask about "cats" and you'll find memories mentioning "feline" and "kitten" as well. Quilltap ships with a built-in TF-IDF system that works offline with zero configuration, but you can plug in OpenAI, Ollama, or OpenRouter embeddings for higher-fidelity semantic search.
The Memory Gate
Memory systems that simply accumulate everything eventually drown in their own redundancy. Quilltap's Memory Gate intercepts every new memory at write time and makes a three-way decision based on semantic similarity to what already exists: REINFORCE near-duplicates (boosting their importance rather than creating a copy), LINK related-but-distinct memories (building a thematic graph of connections), or INSERT genuinely novel information. The result is a memory store that stays lean and meaningful rather than cluttered with seventeen slightly different phrasings of the same fact.
Proactive Memory Recall
Characters don't wait to be asked. Before generating a response, each character analyzes the recent conversation to extract search keywords, then queries its own memory store for relevant context — running in parallel with the compression check to minimize latency. In a multi-character scene, each participant recalls independently based on what they've missed since they last spoke. The effect is subtle but transformative: characters feel like they've been paying attention, because they have.
Context Compression
Long conversations inevitably exceed any model's context window. Rather than silently dropping older messages — the approach favored by most AI applications — Quilltap uses the cheap LLM to generate compressed summaries that preserve the essential narrative and factual content. When the context runs thin, the AI has a rich summary to draw from rather than a void.
The Concierge — Intelligent Routing, Not Blunt Refusal
Let us be direct about something: Quilltap is used for creative fiction, including fiction that explores uncomfortable territory. A platform designed for storytellers must handle mature content with intelligence, not with a blunt instrument. This is the Concierge's domain.
The Concierge is Quilltap's content classification and routing subsystem — and the operative word is routing, not blocking. A request flagged as sensitive is not a request denied; it is a request that deserves the right provider, the right context, the right door. The Concierge knows every back entrance in town. Three operating modes:
- Off — No scanning, no filtering. You're an adult, you've read the room, carry on.
- Detect Only — Messages and chats are classified and flagged, but not blocked. Visual indicators appear so you always know where you stand.
- Auto-Route — Sensitive content is automatically redirected to an uncensored provider that you've configured, while everything else goes through your standard model. The AI adapts mid-conversation without you lifting a finger.
Behind the scenes, the Concierge uses the cheap LLM to classify individual messages and entire chats, tracking scores across configurable categories. Once a chat is classified as sensitive, the designation is sticky — it doesn't flicker back and forth. A startup scan processes unclassified chats on boot and at regular intervals, so even legacy conversations get properly indexed. Chat cards display visual indicators, and a quick-hide toggle lets you sweep flagged content out of the sidebar when the situation calls for discretion.
If a provider refuses silently — returns nothing, says nothing, simply declines to engage — the Concierge catches the empty response and quietly retries with a provider who will not flinch. This applies everywhere: memory extraction, context compression, image generation, the lot. The Concierge's principle is that your request deserves an answer, and it is his job to find someone willing to give one.
The philosophy is simple: the user decides their own limits. Quilltap provides the tools for informed navigation, not moral judgment.
The Desktop Application — A VM With a View
One of Quilltap's more distinctive architectural decisions is how the desktop application actually runs. Rather than executing server-side code directly on your operating system — an approach that works fine until someone's AI starts writing shell scripts — Quilltap runs its entire backend inside a lightweight Linux virtual machine.
On macOS, this means Lima with Apple's Virtualization.framework — fast, native, and near-invisible. On Windows, it's WSL2, which is built into Windows 10 and 11. In both cases, the Electron desktop shell manages the VM lifecycle automatically: you see a splash screen with a branded loading animation while the backend starts, and then your workspace opens in a native window. No terminal commands, no Docker knowledge, no configuration files.
Why a Virtual Machine?
Two reasons, and they're both important. The first is simplicity: the Quilltap backend is a Next.js application that expects a Linux environment with SQLite and local file storage. Wrapping it in a VM means macOS and Windows users get a consistent, tested Linux runtime without installing anything beyond the app itself.
The second is security through isolation. As LLM capabilities expand into agentic territory — AI that can read files, write code, and use tools — the question of where that code executes becomes critical. A VM is a genuine sandbox. If an AI-generated script misbehaves, it misbehaves inside a contained environment that has no access to your host system beyond the files you've explicitly shared. This isn't a theoretical concern; it's a design principle that shapes everything we build.
The Docker Alternative
For those who prefer containers — or who are running a Linux server — Quilltap also runs as a single Docker container. The Electron app even includes a Docker runtime toggle right on the splash screen: install Docker Desktop, switch the mode, and the app will pull the image and manage the container for you. Same native window, different engine underneath. Or skip the Electron wrapper entirely and run docker run csebold/quilltap for a browser-based experience.
Your Data, Your Directory
All of Quilltap's data — database, files, logs, everything — resides in a single directory on your machine. The Electron app lets you manage multiple data directories from its splash screen, switching between them with a quick stop-and-start of the VM. Each directory is self-contained: back it up by copying a folder, migrate it by moving one.
Aurora — Characters That Live and Breathe
Most AI character systems give you a name field, a personality box, and a hopeful prayer to the context window. Aurora goes considerably further.
Each character in Quilltap is a structured entity with physical descriptions (multiple, with usage contexts for different scenarios), clothing records (tracked separately, so a character's wardrobe is an actual wardrobe), aliases ("Liz" for "Elizabeth"), pronouns injected into system prompts and multi-character context, and detailed personality and backstory text. Characters maintain their identity through a reinforcement block placed at the very end of the context — right at the generation boundary — that explicitly reminds the LLM who it is, who it must not speak for, and who else is in the room.
In multi-character scenes, a turn management system handles the queue: numbered position badges, nudge controls for idle speakers, per-card model switching, and the ability to impersonate any character mid-scene. You can even run fully automated all-LLM conversations with configurable pause intervals — useful for brainstorming, worldbuilding, or simply watching your characters argue with each other while you take notes.
If you're coming from SillyTavern, Quilltap imports your characters and chats directly — including multi-character conversations with speaker mapping.
The Lantern — Atmosphere as Architecture
The Lantern governs visual atmosphere. Its signature feature is Story Backgrounds: AI-generated landscape images that appear behind your chat content, creating a sense of place for each conversation.
Here is how it works: when a chat reaches a natural scene-setting moment, the cheap LLM analyzes recent messages and derives a scene context — not a literal transcript, but an imaginative scene description. If the characters are discussing a book or story, they might be depicted as observers in that world. A separate LLM task then resolves what each character currently looks like, consulting their physical descriptions, clothing records, narrative context, and usage contexts to determine the most scene-appropriate appearance. The result is a generated image that reflects the mood and setting of the actual conversation.
Story backgrounds appear at 45% opacity behind chat content, creating atmosphere without competing with readability. They display as thumbnails on chat cards and in the chat header, with full-screen viewing available on click. For chats flagged by the Concierge, image generation automatically reroutes to your configured uncensored provider.
Beyond backgrounds, The Lantern manages image generation profiles for Google Gemini/Imagen, Grok, OpenAI, and OpenRouter, with prompt expansion that uses character and persona descriptions via placeholders. Generate images directly within any chat and iterate on prompts without leaving the conversation — results automatically land in your gallery for reuse as avatars or attachments.
Calliope — Themes That Change Everything
A theme, in most applications, changes a few colors and calls it a day. In Quilltap, themes are plugins that can redefine the entire personality of the application.
Quilltap's theming system is built on semantic CSS variables — a layer of qt-* utility classes that map to every UI surface: cards, panels, buttons, chat bubbles, sidebars, message styling, and typography. A theme plugin overrides these tokens and can embed its own fonts, CSS, and even background images. Switching themes happens live, instantly, with no page reload — the entire application redraws itself around you like a room being redecorated while you sit in the chair.
Quilltap ships with six bundled themes ranging from the clean Professional Neutral to the rich, mahogany-and-gold Great Estate to the geometric elegance of Art Deco. Themes can also override subsystem names and navigation card images — so "The Foundry" becomes "Settings" in one theme and "The Machinist's Floor" in another. The characters who personify each subsystem appear as decorative elements in the UI, adjusting their presence to match the current theme's personality.
Third-party theme development is supported through an SDK (create-quilltap-theme, @quilltap/theme-storybook) that lets you build, preview, and publish theme plugins to npm without access to the Quilltap source code.
The Plugin Architecture — Extending Everything
Quilltap was designed from the ground up to be extended. Every LLM provider, every authentication method, every theme, and every upgrade migration is delivered as a plugin. This isn't an afterthought bolted onto a monolith — it's the primary delivery mechanism for the application's capabilities.
The plugin system supports six types: LLM providers (chat, image generation, and embeddings), themes, roleplay templates, tools (extending what the AI can do), storage backends, and search providers. Plugins install from npm, scope to site-wide or per-user levels, and register their capabilities through a central plugin registry that the rest of the application queries at runtime. No hardcoded provider lists, no conditional imports based on which service you happen to use.
Combined with MCP (Model Context Protocol) support, this means Quilltap can connect to external tool servers, local model endpoints, custom search backends, and whatever else the ecosystem invents — without waiting for us to ship an update.
Pascal the Croupier — Dice, State, and Fair Play
For tabletop roleplayers, interactive fiction authors, and anyone who occasionally needs the universe to make a decision: Pascal manages random number generation and persistent game state.
Dice rolls are cryptographically secure and auto-detected: type "I roll 2d6" in a message and the dice actually roll, with results rendered inline. Coin flips, d4 through d1000, and random participant selection are all available from the chat composer. Beyond RNG, Pascal tracks persistent JSON state — inventories, stats, scores, and arbitrary structured data — that survives across messages and even across chats within a project. Protected keys (prefixed with an underscore) can't be modified by the AI, so your game master notes stay where you put them.
Security — Encryption, Isolation, and Ownership
Quilltap is self-hosted. Your data lives on your machine or your server. There is no Quilltap cloud, no analytics telemetry, no training pipeline consuming your conversations. This is not a philosophical stance masquerading as a feature — it is the architecture itself.
Encrypted at Rest
Every Quilltap database file — your chats, memories, characters, API keys, LLM logs, all of it — is encrypted on disk using SQLCipher with AES-256. The standard sqlite3 command-line tool cannot open these files. A forensic utility pointed at your data directory would find only encoded stone. On first installation, Quilltap generates a unique .dbkey file and seals the database before any other work begins. Upgrading from an older installation? The converter runs automatically — your plaintext database is rewritten in cipher and the unencrypted original is swept away.
One practical note worth heeding: back up the .dbkey file alongside your data directory. A database without its key is perfectly sealed and entirely unreadable — by anyone, including you. Keep them together.
Locked Mode
For those who share a machine or work in circumstances where presence itself cannot be assumed safe, there is an optional locked mode: a passphrase, processed through hundreds of thousands of iterations of a key-hardening function before it ever touches the encryption key. When locked mode is active, Quilltap will not open — will not surface memories, will not admit a chat, will not render a character — until you speak the word. Enable it in Settings → Data & System. The base encryption is always present regardless; locked mode adds a second gate for those who want it.
The Rest of the Perimeter
Beyond the database layer: the SQLite engine runs with integrity checks, WAL checkpoints, and physical backups with tiered retention — daily for a week, weekly for a month, monthly for a year, yearly forever. Every provider call flows through the plugin registry, so network endpoints are centralized and auditable. There is no ambient data leakage because there is no ambient data collection.
Backups are a single ZIP file containing everything — or selective native exports with conflict resolution for sharing specific content. If you want to move your data, you move a folder. If you want to destroy it, you delete one. The application tells you exactly where your data directory is, at the bottom of every page, because we believe you have a right to know where your own things are kept.
The Big Picture
Quilltap is what happens when you take AI chat seriously enough to give it real memory, real characters, real isolation, and real ownership — and then wrap it in an interface that treats the user as an intelligent adult rather than a hazard to be managed.
The Commonplace Book means the AI actually learns who you are. Aurora means your characters feel like people, not templates. The Concierge means you set the limits, not a corporate policy team three time zones away. The VM sandbox means agentic AI runs in a safe, contained environment. The plugin system means the application grows with you. And the theming engine means you can make it all look however you please while you're at it.
Get started — or browse the full feature list if you'd like to see what else is in the house.