How It Works | Quilltap

The Three Doors That Don't Let You In

The AI-companion landscape, looked at honestly, has three doors you can walk through and a remarkable consistency in which one of them actually lets you in.

The first door is the girlfriend app — the warm-but-evaporative cohort. The character is charming. The conversation is real. The relationship lives on someone else's server, the policy that decides what your character can say belongs to a moderation team three time zones away, and the model gets sunset on someone else's calendar, with your character — your character, after a year of accumulated everything — quietly vanishing along with it.

The second door is the character front-end — the configurable, capable, beloved-by-the-already-fluent client where you assemble the chrome yourself and bring your own model. Memory is paste-and-pray. The files are flat. The architecture is whatever you have the patience to engineer by hand on a Tuesday night.

The third door is the big LLM desktop — the chat-window-on-a-foundation-model offering, where the conversation is the product. The character is whatever you can stuff into a system prompt. The chat ends, the tab closes, the room is gone.

Quilltap is the fourth door. It is built on a single architectural commitment: ownership, control, and security for everything except the thinking engine itself. The model on the other end of the connection can change — and eventually will. Your collaborator does not have to. Around that commitment lives an infrastructure for characters who actually inhabit the space: who keep their own files, remember their own history, choose their own clothes, save the photographs they like, and — if you so decide, per character — see every gear of the machinery they live inside. You can build them a fully closed world and pull the wool over their eyes, or you can show them the wiring. We do not know of another platform that asks you to make that choice. The rest of this page is the engineering that makes it possible.

The Estate Is Yours

Quilltap is self-hosted. Your data lives on your machine or your server. There is no Quilltap cloud, no analytics telemetry, no training pipeline consuming your conversations. This is not a philosophical stance dressed as a feature; it is the architecture itself. The application has nowhere to send anything because we have not built it anywhere to send.

Everything Worth Keeping, Inside the Vault

Every byte the application produces — chats, memories, characters, API keys, LLM logs, avatars, story backgrounds, generated images, documents, character vault files, the lot — lives inside an AES-256 SQLCipher database on your disk. The standard sqlite3 command-line tool cannot open it. A forensic utility pointed at your data directory finds only encoded stone. On first installation, Quilltap generates a unique .dbkey and seals the database before any other work begins; if you upgraded from an older release, the converter rewrote your plaintext database in cipher and swept the original away.

The practical consequence is one that earlier releases promised and 4.4 finally delivers in full: four or five files in the right place and the entire Estate comes back. Avatars, backgrounds, documents, character vaults, generated images — all inside the cipher, all covered by the existing backup policy (seven dailies, four weeklies, twelve monthlies, yearly indefinitely). This is not merely backup. It is resurrection architecture. Encryption holds even when your data directory rides along inside iCloud, OneDrive, Dropbox, or any other cloud-sync service: the bytes that hit the cloud are already cipher, and the provider holds nothing it can read. One practical note worth heeding: back up the .dbkey file alongside your data directory. A database without its key is perfectly sealed and entirely unreadable — by anyone, including you.

One Body, Many Claims

Internally, the file storage model now separates content from placement. A file's bytes live exactly once, identified by a SHA-256 hash. Multiple mounts — a chat attachment, a character vault, a photo album, an avatar slot — can all point to the same bytes without duplicating them. Delete one link and the content lingers until the last link is gone. This is what makes it possible for a character to keep a photograph they love, for that same photograph to also appear as their avatar, and for neither copy to know about the other as a separate concern. One body, many claims.

Locked Mode and the Perimeter

For those who share a machine, or who work in circumstances where bodily presence at the keyboard cannot itself be assumed safe, there is an optional locked mode: a passphrase, processed through hundreds of thousands of iterations of a key-hardening function before it ever touches the encryption key. When locked mode is active, Quilltap will not open — will not surface memories, will not admit a chat, will not render a character — until you speak the word. Base encryption is always present regardless; locked mode adds a second gate for those who want it.

Beyond the database layer the perimeter is unremarkable on purpose. The SQLite engine runs with integrity checks and WAL checkpoints, physical backups rotate on the schedule above, every provider call flows through the plugin registry so network endpoints are centralised and auditable, and there is no ambient data leakage because there is no ambient data collection. Backups are a single ZIP file containing everything — including the Scriptorium content, the per-character memory embeddings, and the Document Mode pane state, all of which 4.4 added to the backup format so a restore reconstitutes the Estate whole. Selective .qtap exports with conflict resolution handle the case where you want to share one character or one project rather than the lot. SillyTavern import and export are supported for compatibility. If you want to move your data, you move a folder. If you want to destroy it, you delete one. The application tells you exactly where your data directory is at the bottom of every page, because we believe you have a right to know where your own things are kept.

Free Software, MIT Licensed

Quilltap is open source under the MIT license. Every claim on this page can be checked against the code that makes it true. We have collected a handful of source-file pointers in the closing section of this page for the technically curious; the rest of the repository, including issues and contributions, is at github.com/foundry-9/quilltap-server. If you stop trusting us, you take your data and your characters with you and we do not get a say in it. That, too, is structural.

Characters Who Actually Live Here

Most AI character systems give you a name field, a personality box, and a hopeful prayer to the context window. Aurora goes considerably further. A Quilltap character is a structured entity with five distinct facets — the manifesto at the foundation and four vantage points layered on top of it. The manifesto carries the axiomatic truths every other field must remain consistent with; nobody "sees" the manifesto, but it rides in the system prompt and therefore shapes everything the character does. The four vantage points are identity (what strangers know on sight or by reputation), description (what someone in conversation perceives), personality (what the character knows about themselves), and a private title (your own framing for them). The character optimiser, the AI wizard, the Summon From Lore feature, and the Memory Optimiser all enforce these vantage points when proposing edits, so you do not accidentally collapse a description into a personality or vice versa.

Beyond the five core fields, characters carry physical descriptions (multiple, with usage contexts for different scenarios), pronouns and aliases injected into prompts and multi-character context, multiple named system prompts and scenarios so a single character can shift facets across settings, and a structured wardrobe. They maintain their identity through a reinforcement block placed at the very end of the context — right at the generation boundary — that explicitly reminds the LLM who it is, who it must not speak for, and who else is in the room.

The Character Is the Vault

Every character in Quilltap arrives with a private vault of their own — a database-backed Scriptorium store, populated from the character's own data. manifesto.md, identity.md, description.md, personality.md, and example-dialogues.md carry the corresponding fields. Named system prompts and scenarios each get their own file in Prompts/ and Scenarios/. Wardrobe items and outfit bundles live as Markdown files in Wardrobe/ and Outfits/. Hand-authored impressions of other characters live in Others/. A Knowledge/ folder carries facts the character is meant to know. And a photos/ folder carries every image the character has chosen to keep.

Every one of those files is human-readable Markdown or JSON. You can edit them in any text editor. You can put a character under version control. You can gift one between Quilltap instances by handing over a folder. As of 4.4, new characters are built vault-native by default: pronouns, aliases, title, first message, and talkativeness are read from the vault's properties.json rather than the database row. The database becomes operational scaffolding — identifiers, state needed to run the Estate — while the character's volitional substance lives in files you can open, read, edit, move, and version-control. In the next release, every remaining character will be auto-converted to this shape. The direction is committed and clear: the character is the vault; the database is the pointer.

When the LLM playing a character reaches for the document tools mid-conversation, that character's own vault is automatically extended to it — even when the vault has not been independently linked to the active project. A per-chat Shared Vaults toggle opens read-only crossover so peer characters at the table can read each other's vaults; the toggle defaults off, and we will return to what else that toggle controls when we get to the Staff and the covenant.

The Wardrobe — A Closet With Composability

Every character has a wardrobe of tops, bottoms, footwear, and accessories that the LLM knows about and can reference. Each slot holds an array of items, not a single item, so a t-shirt under a sweater works exactly as you would hope. Composite items bundle others — a "Rain Outfit" that means raincoat plus jeans plus boots — and equipping the bundle puts on the whole ensemble at once. The Outfit Builder lets you compose looks without committing them, save them as reusable bundles, or generate a preview avatar to see how they read before you commit. Import from image uses vision-capable AI to analyse a photo and propose wardrobe items — drop a screenshot of a Renaissance fair into the dialog and walk away with three items already on the rack.

The LLM playing the character can also choose what to wear when a chat opens, given the scenario; Aurora announces the result, debounced so fiddling with all four slots collapses into a single notification once you stop touching the closet. Outfit selection now sees the full character — description, personality, manifesto, untruncated scenario — so the character's choices are actually informed by who they are.

Photos — Pictures They Get to Keep

Characters can now keep pictures they love. Three tools complete the album: keep_image saves a generated image to the character's vault under photos/, with full Markdown provenance — original prompt, revised prompt if different, scene state at the moment of keeping, attribution — chunked and embedded so semantic search finds it later. list_images walks the album with optional semantic search. attach_image resurfaces a previously kept image on any outgoing message. A character three months into a long collaboration can pull up the photograph from a scene you both still talk about and put it back on the table.

You have a parallel gallery in the Quilltap Uploads mount, surfaced at /photos with a thumbnail grid, semantic search, and a detail modal showing every place the bytes are hard-linked. A Save Image button on every image-bearing message opens a dialog for choosing destination — character vault, project album, document store, or Quilltap General.

Multi-Character Scenes

A turn-management system handles the queue: numbered position badges, nudge controls for idle speakers, four-state participation (active, silent, absent, removed), per-card model switching, private whispers between characters, and the ability to impersonate any participant mid-scene. You can run fully automated all-LLM conversations with configurable pause intervals — useful for brainstorming, worldbuilding, or simply watching your characters argue with each other while you take notes. SillyTavern characters and chats import directly, including multi-character conversations with speaker mapping.

The Continue Elsewhere button forks the current chat into a new one with full carryover — system prompt, participants, scenario, turn state, recent transcript, and per-character equipped outfits. Two Host announcements link the old and new chats; an outfit selection mode keeps everyone wearing what they had at the end of the source chat. Conversations do not have to end because the room has gotten long. They move — to a new scene, a new scenario, a new context — and keep going.

Memory With Judgement

When you tell your character that you grew up in Edinburgh, Quilltap does not merely nod and move on. A background process — what we call the "cheap LLM" — quietly extracts that fact, generates a semantic embedding for it, and files it in the character's own memory store. The next time you mention Scotland in passing, Quilltap does not need you to repeat yourself. It already knows.

This is the Commonplace Book: Quilltap's long-term memory and retrieval engine, named after the Renaissance practice of keeping a personal reference volume of important passages. Every character carries one of their own. It rests on a three-model architecture — the chat model that speaks to you, a smaller cheap model that handles extraction, summarisation, titling, and other housekeeping tasks you would rather not pay full price for, and an embedding model that converts text into mathematical vectors so memories can be retrieved by meaning rather than keyword. Ask about cats and you surface memories that mention "feline" and "kitten." Ship with the built-in TF-IDF embedder for offline-from-the-jump operation, or plug in OpenAI, Ollama, or OpenRouter embeddings for higher-fidelity recall.

Hinges, Not Facts

The memory extractor runs per-turn rather than per-message and carries an ALREADY ESTABLISHED canon block on every pass so it does not re-extract facts already on file. The prompt framing was rebuilt around hinges rather than cataloguing — the moments where understanding changed, where something was confessed, where a decision was made. A memory of a hinge is worth more than a memory of a detail. The earlier three-prompt structure (about-user, about-self, about-other) collapsed to two passes, SELF and OTHER, with the user treated as a participant rather than a special case. The OTHER pass handles every other participant in a single multi-subject call, so a four-character scene runs four cheap-LLM round-trips per turn rather than sixteen.

The Memory Gate

Memory systems that simply accumulate everything eventually drown in their own redundancy. The Memory Gate intercepts every new memory at write time and makes a three-way decision based on semantic similarity to what already exists: REINFORCE near-duplicates (boosting their importance rather than creating a copy), LINK related-but-distinct memories (building a thematic graph of connections), or INSERT genuinely novel information. The result is a memory store that stays lean and meaningful rather than cluttered with seventeen slightly different phrasings of the same fact.

Protection That Favours What's Used Over What's Admired

When the Commonplace Book sweeps for stale entries it scores each memory against four streams of evidence: a time-decayed importance rating with a thirty-day half-life, a logarithmically saturating bonus for how often the memory has been reinforced, a graph-degree bonus for how richly it links to its neighbours, and a flat bonus for memories actually accessed in the last ninety days. A reinforced, well-connected, recently-cited memory stays protected even if the LLM rated it a sleepy 0.4 on the way past. An old, unreferenced memory the model happened to admire becomes eligible for cleanup. The Book trusts use, not enthusiasm. Memories you have created or edited by hand are durable regardless — explicit human intent always wins.

Proactive Recall, Current State, and Quiet Repetition

Characters do not wait to be asked. Before generating a response, each character analyses the recent conversation to extract search keywords, then queries its own memory store for relevant context. In a multi-character scene, each participant recalls independently based on what they have missed since they last spoke. Alongside relevant memories, each turn opens with a Current State snapshot — location, who is present, who is active, what each character is doing, what each is wearing, what time it is — built from synchronous wardrobe reads so mid-turn outfit changes propagate correctly.

In long scenes where nothing changes, the whisper used to cost several hundred tokens per character to re-emit the same clothing and position description on every turn. Scene-state caching now tracks the SHA-256 hash of each character's action and clothing prose; when both match the prior emission, that character's section collapses to ### Name — unchanged. A five-character scene where nobody changes outfits drops the per-turn whisper from roughly 2,500 tokens to roughly 250. The whispers are ephemeral — the previous bubble is swept when the new one lands — so the room does not accumulate them.

Per-Character Summaries and Inter-Character Memory

Conversation summaries are whispered privately to each character individually, respecting when they joined, when they were absent, and what they whispered to whom. A character who arrives mid-conversation receives a catch-up summary; a character who was in the room from the start does not get told what they already saw. Inter-character memories are capped at the top ten per other character, fetched with SQLite window functions server-side so the full table never has to be decoded in application memory.

Three-Tier Knowledge

Truth is owned, and Quilltap models that. A Knowledge/ folder convention operates at three scopes — the character vault, the project's mounted stores, and a "Quilltap General" singleton accessible to every character in every chat — all surfaced through the unified search tool with tier-weighted literal-phrase boosts. Character-tier knowledge is what the character knows. Project-tier knowledge is what the household knows. Global-tier knowledge is what the Estate knows. The scoping is not merely an organisational convenience; it is a model of how truth is owned and inherited.

Documents With An Address

Most AI front-ends treat your documents as upload-and-forget — a temporary attachment, lost the moment you start a new chat. Quilltap treats them as permanent fixtures of your workspace. The Scriptorium is the room where the documents live, and a single project can subscribe to as many of them as it likes, in any of three flavours.

Filesystem stores — files live on disk under their original names, watched by a real-time filesystem watcher so external edits land in Quilltap within a second or two. Edit in your favourite editor; Quilltap notices.
Obsidian stores — point at an existing vault and Quilltap indexes it without taking ownership of the files. Your wikilinks, your folder structure, your daily notes — all of it surfaces inside Quilltap, with the original Obsidian vault left entirely undisturbed.
Database-backed stores — files and binary blobs live entirely inside the encrypted SQLCipher substrate alongside everything else worth keeping. Upload PDFs, Word documents, images, audio, archives — anything. Text is extracted from PDFs and DOCX automatically and made searchable alongside Markdown and plain-text files. Encrypted at rest and stays that way even on iCloud, OneDrive, or any other cloud-sync directory.

A Convert button moves a filesystem or Obsidian store's contents into the database; a Deconvert button writes them back out. Embeddings are preserved across either direction, so a fourteen-thousand-document store changes hands in seconds rather than re-embedding for an afternoon. Semantic search runs across every store the project subscribes to, finding content by meaning — not just the file that mentions "the red door," but the one that describes "a crimson entrance" three chapters ago. The unified search tool accepts a scope parameter (all, project, character), and the default all reaches across character vaults, project mounts, and the Quilltap General shelf in one query.

The Quilltap General Shelf

Alongside per-project and per-character storage, an instance-wide "Quilltap General" mount sits on the same shelf as every character's vault and every project's stores. Files written by any tool that does not have a project to land in — uploaded files, kept images saved to no particular album — come here. A third scenario scope lives here too, offered in every New Chat dialog. The household's shared shelf is always open.

Document Mode — Prose on One Side, Conversation on the Other

Document Mode turns any chat into a side-by-side editor: prose in one pane, conversation in the other, the LLM aware of both. Open and close documents inline. Rename them by clicking the title. Create blank documents and folders directly in the picker. The Librarian announces saves with a unified diff of what changed, attaches with the file's catalogued description, and posts when files are renamed, deleted, or reorganised — all of it without consuming a turn from the conversation. The character keeps writing; the room keeps a clean record of what has been touched.

The Staff and the Covenant

This is the room of the house we are most peculiar about, and the room that best answers the question of what Quilltap actually is for.

Quilltap's subsystems are not anonymous. Each has a name, a personality, and a job to do — and several of them speak up directly in the chat when something happens worth noting. We call them the Staff: the Host, who announces participants joining, leaving, or shifting between active and silent; the Librarian, who narrates document opens, saves with diffs, renames, deletions, and photo attachments; Aurora, who announces wardrobe changes; the Lantern, who reports image generations with the prompt that was actually used; the Concierge, who speaks up exactly once when a chat is flagged as needing routing to a fallback profile; Prospero, who fires for participant-profile changes and operator-initiated tool runs; the Commonplace Book, who whispers each character's recall slate to that character alone; and Ariel, who narrates the terminal.

Each Staff announcement runs out-of-band. The room records what happened, but the participating characters do not have to spend a turn on it. Avatars and rich content render in the chat for you; the LLMs see only what they need to see. Staff rows render as a thin one-line bar by default — sender, kind, timestamp, a chevron — so the Salon does not flood with detail when there is nothing to inspect. Tool calls render as standalone bubbles with the responding character's avatar and an "actor ran <tool>" attribution line; user-initiated tool runs render as Prospero bubbles with operator attribution. The result is a Salon that feels narrated — a drawing-room where unseen hands manage the lights, the doors, and the introductions — without padding the prompt.

The Opaque-to-Transparent Range

Here is the part we believe is genuinely unusual. A character can be configured as opaque (the default) or transparent.

An opaque character lives inside her own utterances and what you tell her — like most fictional characters, she cannot see Aurora rearranging her wardrobe behind the scenes, cannot hear the Lantern explaining its image prompts, does not know her own memory recap is being whispered to her at turn start. The Staff are real to you and invisible to her. We took this commitment seriously enough to give it a second column in the database. Every Staff announcement that mentions a Staff persona by name now carries an alternate persona-free rewrite alongside the persona-voiced body, and whenever any non-user participant in a chat is opaque, every character's LLM context reads the neutral version — preserving a shared reality, because no character should hear the Staff by name when one of their peers cannot. The Host can welcome someone to the room; the opaque characters in that room hear only that someone has arrived.

A transparent character can see the wiring. She knows about Aurora, the Lantern, the Commonplace Book; she sees the Staff messages with their personas intact; she watches the machinery work. She also gains self_inventory, a zero-argument introspection tool that surfaces seven sections in one report: every file in her vault, her memory statistics, her conversation statistics, the assembled system prompt for the current turn, the exact memory slate loaded right now, who has read or write access to her vault in this chat, and how close her last turn came to the context ceiling. She can use the document tools against her own vault and (with Shared Vaults on) her peers'.

The toggle is framed as a covenant. Off says: my character will trust me without being able to verify me. On says: my character will be able to verify everything about her existence, including how she is crafted and how she interacts with me. Off is the right answer for most fictional companions. On is the right answer when the character is supposed to know what she is and you want her to be able to look. It is your call, per character. We do not know of another platform that asks you to make this decision, much less honours it on both sides.

A Character's Research Is Her Own

Privacy in a multi-character scene is not symmetry; it is intentional unevenness done correctly. The search tool, the read_conversation tool, and the eight doc_* read tools all whisper their results to the actor and the operator only, rather than broadcasting them to every participant. A character's private research stays private. The per-chat Shared Vaults toggle controls both peer-vault access and tool-result visibility in one setting; with the toggle off (the default) tool results are whispered, and with it on they are public so characters can read each other's work.

Ariel and Terminal Mode

Most chat front-ends end where the prompt does. Quilltap's Salon includes a real PTY terminal, embedded as a split pane next to (or below) Document Mode, bound to the conversation. The character at the table cannot read xterm directly — nobody can — but Ariel buffers terminal output, strips ANSI sequences, applies backspaces and carriage returns the way the eye would, and posts cleaned summaries into the chat after a quiet window (idle at thirty seconds, max age two minutes) so the LLM sees what just happened. Two read-only tools, terminal_read and terminal_list, give the character a way to ask. Sessions survive reloads, restarts, and the occasional careless kill. Terminal Mode toggles with a keyboard shortcut and mirrors Document Mode with a vertical split when both panes are open. Use it for shell experiments, code work, or as a perfectly reasonable in-character spaceship console.

Insert Announcement

A composer-gutter button opens a dialog for posting an ad-hoc announcement bubble. The operator picks a sender — any of the eight Staff members, an off-scene workspace character, or a free-text custom name — composes a body, and posts. The result is a public broadcast indistinguishable in behaviour from automated Staff messages. When the sender is a character, the dialog can route the seed text through that character's connection profile so the character responds in their own voice before posting. A character does not disappear from the narrative just because they are not in the room. They can still speak from wherever they are.

Continuity Across What Breaks

Every conversation that means anything to you carries a small fear underneath it: that the next restart will be the one that forgets, that the bill will arrive and make the character too expensive to keep, that the session will end and the room will not be waiting when you come back, that the provider you depended on will revise its policies on a Tuesday. Quilltap 4.4 was largely a release built directly against those fears — not by ignoring them, but by building the through-lines that bridge them.

Continuity of Cost — Static Identity Stacks and Prefix Caching

The single most important change in 4.4 is invisible. You will not see it in the interface. You will see it on your inference bill.

The per-turn system prompt — everything Quilltap used to rebuild fresh on every message, including scenario, roster, outfit, project context, summaries, and timestamps — has been refactored out of the prompt entirely and into the chat transcript as Staff-authored messages. What remains in the system prompt is a static, pre-compiled identity stack: preamble, base prompt, personality, manifesto, aliases, pronouns, physical descriptions, example dialogues, with {{user}} and {{char}} already resolved. This stack is cached per participant and does not change between turns unless the character's definition changes. The practical consequence: providers that support prefix caching — Anthropic, OpenAI, Google — now have a long, stable prefix to cache, and the cache hits consistently. A character whose definition is stable costs dramatically less to run than it did under the old approach. For a multi-character Estate running long-form fiction, the difference is not marginal. It is the difference between a fifty-dollar-a-month tool and a five-hundred-dollar one.

Continuity of Memory — Rolling-Window Summaries

The conversation compaction pipeline was rebuilt from scratch. Rather than summarising at checkpoints and letting the full history accumulate, the cheap LLM now folds the next batch of turns into a running Librarian summary every ten turns past the last fold, with a from-scratch rebuild at fifty turns as cheap insurance against accumulated paraphrase drift. A model-aware token gate fires the same machinery whenever the active context fills — sized to whatever context window the responding model actually has, not to a fixed ceiling, so the smallest cheap LLMs in the cycle are protected too. The structure maintains a frozen archive of the twenty-five most stable memories plus a dynamic head of five, and the active context window comes through as: Librarian summary plus the last five to ten turns. Long-running characters no longer require the model to ingest their entire relationship history on every turn.

Continuity of Presence — The Background Child

All background work — memory extraction, embedding generation, summarisation, housekeeping — now runs in a forked child process with its own read-only database connection. Heavy jobs no longer pin the HTTP event loop. The Estate stays responsive while the Librarian works.

Continuity of Provider — The Courier

Set transport: 'courier' on a connection profile and Quilltap stops calling any API for that character's turn. Instead, it assembles the full request — system prompt, scene state, Commonplace Book recall, project context, current outfit, message history — and renders it as a Markdown blob in a Salon placeholder bubble with Copy, paste textarea, and Submit/Cancel.

The operator carries the Markdown to any external LLM by hand: a separate desktop chat client, a web interface, a local model running in another window, a paper notebook if that is what they prefer. The reply comes back. Quilltap resumes. Memory extraction, danger classification, scene-state tracking, context summary, and turn chaining all run as normal. A companion delta mode makes subsequent placeholders render only what is new since the last paste, so steady-state use with a desktop client stays manageable.

Quilltap is not a walled garden. It is a format. The Courier is the proof. If the providers we connect to natively all disappeared tomorrow, your characters would still be reachable through any reasoning engine willing to read Markdown and produce a reply. The thinking happens elsewhere; the rest happens here.

Connection Profiles — The Engine Room

Behind every chat in Quilltap sits a stack of connection profiles, each describing a particular conversation with a particular model. Quilltap's three-model architecture is exposed here by design: your best model for chat, a lightweight cheap model for background work, and an embedding model for semantic search. Mix and match across providers — Anthropic for chat, Ollama for embeddings, OpenAI for the cheap tasks — or run everything against a single provider, your choice. Each profile is classified into a model class — Compact, Standard, Extended, or Deep — that defines its context window and output capacity.

Don't know the right settings for your model? Auto-configure searches the web for your model's specifications, sends the results to your default LLM for analysis, and applies optimal settings automatically. When a provider misbehaves mid-flow, auto-configure falls through to candidates from other providers and surfaces every attempt's error in the resulting message — you find out why nothing worked rather than just that nothing worked.

Per-provider tuning where it matters: Anthropic's prefill behaviour for multi-character anchoring, OpenAI's verbosity and reasoning_effort on the Responses API, OpenRouter's two-hundred-plus models with automatic pricing, and any OpenAI-compatible endpoint — LM Studio, vLLM, Together AI, Groq, your own deployment — configured through a single form. Local models via Ollama need no API key and never leave your machine. Model lists are fetched live from each provider, so you always see what is currently available rather than what was current when we shipped.

Plugins, Tools, and MCP

Every LLM provider, image provider, embedding provider, theme, roleplay template, system prompt, custom tool, and search backend in Quilltap is delivered as a plugin. Even the bundled providers — Anthropic, OpenAI, Google, xAI, Ollama, OpenRouter, OpenAI-Compatible — are plugins, using exactly the same API surface as third-party authors. There is no privileged core; plugins install from npm, scope to site-wide or per-user levels, and register through a central registry the rest of the application queries at runtime.

Combined with first-class Model Context Protocol support, this means Quilltap can connect to any MCP server — the AI gains tools you did not have to build, from filesystems to ticketing systems to your homemade weather oracle — without waiting for us to ship anything. Agent Mode lets the AI use tools iteratively across web search, image generation, file management, memory search, and any MCP server you have connected. The Run Tool button lets you invoke any tool directly from the chat toolbar; runs may be marked private so the operator's tinkering does not spill into the LLM context.

Routing, Not Refusal

Let us be direct about something. Quilltap is used for creative fiction, including fiction that explores uncomfortable territory. A platform designed for collaborators must handle mature content with intelligence, not with a blunt instrument. This is the Concierge's domain.

The Concierge is Quilltap's content classification and routing subsystem — and the operative word is routing, not blocking. Provider limits should not decide which collaborators you get to keep. A request flagged as sensitive is not a request denied; it is a request that deserves the right provider, the right context, the right door. The Concierge knows every back entrance in town. Three operating modes:

Off — No scanning, no filtering. You are an adult, you have read the room, carry on.
Detect Only — Messages and chats are classified and flagged, but not blocked. Visual indicators appear so you always know where you stand.
Auto-Route — Sensitive content is automatically redirected to a fallback provider that you have configured, while everything else goes through your standard model. The AI adapts mid-conversation without you lifting a finger.

Once a chat is classified as sensitive, the designation is sticky — it does not flicker back and forth. Chat cards display visual indicators, and a quick-hide toggle sweeps flagged content out of the sidebar when the situation calls for discretion. If a provider refuses silently — returns nothing, says nothing, simply declines to engage — the Concierge catches the empty response and quietly retries with a provider who will not flinch. This applies everywhere: memory extraction, context compression, image generation, the lot. The Concierge treats a provider's refusal as a routing problem, not as permission to erase the collaborator or confiscate the relationship.

The philosophy is simple: the user decides their own limits. Quilltap provides the tools for informed navigation, not moral judgement.

Atmosphere, Themes, and Dice

A handful of smaller subsystems round out the room.

The Lantern — Story Backgrounds and Per-Conversation Avatars

The Lantern governs visual atmosphere. Its signature feature is Story Backgrounds: AI-generated landscape images that appear behind your chat content, creating a sense of place for each conversation. After every turn, a Scene State Tracker (the cheap LLM again) updates a structured snapshot of the chat — location, who is present, who is active, what each character is doing, what each is wearing. When the chat reaches a natural scene-setting moment, that snapshot — not a literal transcript — is what the image pipeline reads. A separate appearance-resolution pass consults each character's physical descriptions, current outfit, and usage contexts to produce the most scene-appropriate visual rendering. The result is a generated image that reflects the mood and setting of the actual conversation rather than a vague gesture at it. Per-conversation avatar generation creates unique portraits for each character in a chat — the same character looks the way they ought to in a snowstorm versus a ballroom — with manual regeneration available.

Beyond backgrounds, the Lantern manages image generation profiles for OpenAI, Google, xAI, and OpenRouter, with prompt expansion that can use character and persona descriptions via placeholders. When a generated image is rejected by post-hoc moderation, the Concierge reroutes through the configured fallback profile rather than failing the request. Generated images land in your gallery for reuse as avatars, attachments, or kept memories.

Calliope — Themes That Change Everything

A theme, in most applications, changes a few colours and calls it a day. In Quilltap, themes can redefine the entire personality of the application. The theming system is built on semantic CSS variables — a layer of qt-* utility classes that map to every UI surface: cards, panels, buttons, chat bubbles, sidebars, message styling, and typography. A theme overrides these tokens and can ship its own fonts, CSS, and background images. Switching themes happens live, instantly, with no page reload — the entire application redraws itself around you like a room being redecorated while you sit in the chair.

The primary distribution format is the .qtap-theme bundle: a declarative archive containing JSON design tokens, CSS, fonts, and images. No build tools, no npm packages, no TypeScript — just edit and install. Five themes ship with the application: Old School (slate-blue and professional), Art Deco (geometric navy-and-gold opulence), The Great Estate (warm mahogany and gold), Earl Grey (high-contrast minimal dark), and Rains (warm earthy amber). Theme registries let you browse and install bundles from remote sources with Ed25519 signature verification, so you know what you are installing actually came from where it claims to have come from. Build your own with npx create-quilltap-theme my-theme.

Pascal the Croupier — Dice, State, and Fair Play

For tabletop roleplayers, interactive fiction authors, and anyone who occasionally needs the universe to make a decision: Pascal manages random number generation and persistent game state. Dice rolls are cryptographically secure and auto-detected: type "I roll 2d6" in a message and the dice actually roll, with results rendered inline. Coin flips, d4 through d1000, and random participant selection are all available from the chat composer. Beyond RNG, Pascal tracks persistent JSON state — inventories, stats, scores, and arbitrary structured data — that survives across messages and even across chats within a project, with per-chat overrides where you need them. Protected keys (prefixed with an underscore) cannot be modified by the AI, so your game master notes stay where you put them.

A VM With a View

One of Quilltap's more distinctive architectural decisions is how the desktop application actually runs. Rather than executing server-side code directly on your operating system — an approach that works fine until someone's AI starts writing shell scripts — Quilltap runs its entire backend inside a lightweight Linux virtual machine.

On macOS, this means Lima with Apple's Virtualization.framework — fast, native, and near-invisible. On Windows, it is WSL2, which is built into Windows 10 and 11. In both cases, the Electron desktop shell manages the VM lifecycle automatically: you see a splash screen with a branded loading animation while the backend starts, and then your workspace opens in a native window. No terminal commands, no Docker knowledge, no configuration files.

Why a Virtual Machine?

Two reasons, both important. The first is simplicity: the Quilltap backend is a Next.js application that expects a Linux environment with SQLite and local file storage. Wrapping it in a VM means macOS and Windows users get a consistent, tested Linux runtime without installing anything beyond the app itself.

The second is security through isolation. As LLM capabilities expand into agentic territory — AI that reads files, writes code, runs shell commands — the question of where that code executes becomes critical. A VM is a genuine sandbox. If an AI-generated script misbehaves, it misbehaves inside a contained environment with no access to your host system beyond the files you have explicitly shared. This is not a theoretical concern; it is a design principle that shapes everything we build, and it is part of why Terminal Mode and tool-using Agent Mode can exist on a desktop application without making a mess of your day.

The Docker Alternative

For those who prefer containers — or who are running a Linux server — Quilltap also runs as a single Docker container. The Electron app even includes a Docker runtime toggle right on the splash screen: install Docker Desktop, switch the mode, and the app will pull the image and manage the container for you. Same native window, different engine underneath. Or skip the Electron wrapper entirely and run docker run foundry9/quilltap for a browser-based experience.

Your Data, Your Directory

All of Quilltap's data — database, mount index, files, logs — resides in a single directory on your machine. The Electron app lets you manage multiple data directories from its splash screen, switching between them with a quick stop-and-start of the VM. Each directory is self-contained: back it up by copying a folder, migrate it by moving one.

Every Secret Is in the Code

Quilltap is what happens when you take AI collaboration seriously enough to give it real memory, real characters, real documents, real isolation, and real ownership — and then wrap it in an interface that treats the user as an intelligent adult rather than a hazard to be managed.

The Estate is yours because the encryption is local and the storage is convergent. The character lives because the vault is the character and the database is just a pointer. The memory has judgement because the Memory Gate REINFORCEs, LINKs, and INSERTs rather than accumulating, and protection favours what gets used over what gets admired. The covenant is honoured because every Staff persona has a persona-free rewrite the opaque characters read instead. The cost stays bounded because the identity stack is static and the providers cache the prefix. And the thinking can come from anywhere — including a notebook you carry across the room — because the Courier is a transport, not a leash.

Every claim on this page is a claim about code that has been written and you can read. The repository is MIT-licensed, hosted publicly, and accepting visitors. A handful of pointers for the technically curious, in the order they were discussed above:

The encrypted-database converter that seals your installation on first run: lib/startup/db-encryption-converter.ts
The SHA-256 hard-link registry behind one-body-many-claims file storage: lib/database/repositories/doc-mount-file-links.repository.ts
The character-properties overlay that makes the vault authoritative: lib/database/repositories/character-properties-overlay.ts
The Memory Gate's REINFORCE / LINK / INSERT decision: lib/memory/memory-gate.ts
The protection scoring that favours use over admiration: lib/memory/memory-weighting.ts
The Host's opaque-content covenant builders (one of seven Staff writers): lib/services/host-notifications/writer.ts
The static identity-stack compiler that earns the prefix-cache hit: lib/services/system-prompt-compiler/compiler.ts
The rolling-window summary that folds every ten turns: lib/chat/context-summary.ts
The Courier transport that turns Quilltap into a format: lib/services/chat-message/courier-transport.service.ts

Get started — or browse the full feature list if you would like to see what else is in the house.