Prospero

the major-domo — conducting the operation with quiet authority, and, lately, an audible voice

Every great house has someone who makes the rest of it work. Not the owner, who decides what should happen. Not the staff, who carry it out. The person in between—the one with the clipboard, the patience, and the bitter wit to negotiate between ambition and reality.

Prospero is Quilltap’s orchestration layer. He manages the machinery that turns your message into an LLM request and an LLM response into something useful: the prompt architecture that assembles context, the agent mode that lets models use tools iteratively, the project system that organizes work, the document stores of the Scriptorium, the MCP connections that extend what tools can do, and the file, shell, and terminal access that gives the AI actual hands.

For most of Quilltap’s life he was strictly silent—a hidden hand assembling notes the user never saw. As of recent revisions he has begun to speak: not often, and only on subjects that genuinely concern him, but enough that you will sometimes find his name attached to a paragraph at the table. If Aurora builds the characters and the Salon hosts the conversations, Prospero is the reason either of them can function—and, increasingly, the one announcing whose desk a given errand has been routed to and what the project at hand actually entails.

Prompt Architecture

the cache-friendly identity stack and the table that speaks for itself

When you send a message, the LLM does not receive just your words. But it also no longer receives a single monolithic briefing re-assembled from a dozen subsystems on every turn—an arrangement that was correct in spirit and ruinous in practice, because every minor change to outfit, status, or memory invalidated the entire prompt’s cacheable prefix and forced the provider to re-process the character’s identity from scratch.

The current architecture is built around two stable system blocks and a transcript that does most of the talking. The first system block is the precompiled identity stack: preamble, base prompt, personality, aliases, pronouns, physical description, example dialogues, with {{user}}, {{scenario}}, and {{persona}} already resolved. It is computed once per participant when the chat is created (or when a participant is added or their selected prompt changes) and then cached on the chat row. The second system block is the static identity reminder (“you are X; do not write dialogue, actions, or thoughts for any other character”). Together they form a long stable prefix that providers can prefix-cache, with a third smaller block carrying the rolling conversation summary and a fourth carrying any compressed history.

Everything that used to live in the system prompt as a moving target—scenario, status, the user’s persona, the other participants in the room, who joined under what circumstances, silent-mode rules, the current outfit and the wardrobe behind it, project context and project instructions, the conversation summary, the timestamp, recently-mentioned but absent characters, out-of-band image attachments, the memory recap that anchors a character’s recall—is now a chat-resident message authored by one of the Staff. Aurora announces an opening outfit and any subsequent change. The Host announces participants, status transitions, scenarios, off-scene character introductions, time, and silent-mode entries and exits. The Librarian announces summary regenerations, document opens and saves, and renamings. The Commonplace Book whispers a per-turn briefing of relevant memories and the current scene state. The Concierge announces, with characteristic discretion, that he has rerouted the conversation to a desk better appointed to subjects of its particular character. And Prospero himself—more on this in a moment—announces project context and connection-profile changes.

The user sees these messages. The LLM also sees them, as part of normal conversation history. The same paragraph informs both parties, which is both more honest and dramatically cheaper: provider prompt caches now actually hit, because the bytes that change between turns are at the end of the transcript rather than woven through the system prompt. The Anthropic plugin places its cache breakpoint on the running summary so that when the summary changes, only summary-and-after re-prefills; the system blocks stay hot. OpenAI, Grok, and Z.AI profiles set prompt_cache_key so requests stick to the same cache partition across machines. Tool definitions are canonicalized (alphabetical by name, schema keys recursively sorted) so that two runs with the same enabled tools produce byte-identical bytes.

The identity reinforcement still lands at the end of the system portion, so the model’s final instruction before writing is “you are this person, and no one else.” It just doesn’t have to ride along with three thousand tokens of wardrobe inventory to get there.

Agent Mode

tools, iteration, and self-correction

In standard mode, the LLM responds once per turn. In agent mode it can use tools iteratively—calling a tool, reading the result, deciding whether to call another, verifying its own work, and self-correcting before delivering a final response. A dedicated submit_final_response tool signals when the agent considers its work complete, scoped explicitly to the current turn’s agentic effort. A guardrail in the orchestrator rejects ghost wrap-ups in which the model, on a fresh conversational turn, calls submit_final_response with a recap of the previous turn’s work and no actual prose; the rejection is fed back into the loop with an explanation, and the next pass produces a real response.

Configurable Depth

Maximum turns are configurable from 1 to 25, with a force-final safety limit that ensures the agent always produces a response even if it gets lost in a tool loop. Settings cascade from global defaults down through character, project, and per-chat overrides, so a research character can iterate deeply while a casual companion stays responsive.

Tool Awareness

Agent mode works with every tool in the system—the built-in document, scriptorium, memory, image, state, RNG, web, wardrobe, and terminal tools, plus any tools exposed by MCP servers or tool plugins. The model sees all available tools and decides which to use based on the task at hand.

Tools, the Scriptorium, and MCP

extending what the AI can do

Quilltap ships with a substantial set of built-in tools. The document family (the doc_* family) lets characters read, write, edit, search, and reorganize files in document stores and project libraries: doc_read_file, doc_write_file, doc_str_replace, doc_insert_text, doc_grep, doc_list_files, doc_read_frontmatter, doc_update_frontmatter, doc_read_heading, doc_update_heading, doc_move_file, doc_copy_file, doc_delete_file, doc_create_folder, doc_delete_folder, doc_move_folder, the blob family (doc_write_blob, doc_read_blob, doc_list_blobs, doc_delete_blob), and doc_open_document / doc_close_document / doc_focus for split-pane Document Mode interaction. The Scriptorium-aware search and read_conversation tools cover memory, conversation chunks, and document stores under one ranked retrieval. There is a self_inventory tool a character can call without arguments to receive an introspection report—current Quilltap version, vault contents, memory totals, conversation totals, the assembled system prompt, last-turn LLM usage, the memory slate loaded this turn, and who has read or write access to the character’s vault in the current chat. There is a state tool for persistent JSON state on the chat or project, an rng tool for honest dice, a wardrobe family (list_wardrobe, update_outfit_item, create_wardrobe_item), generate_image for the Lantern, web_search for the open Internet, annotation tools for marginal commentary on conversation renderings, and terminal_read / terminal_list for read-only inspection of any open terminal session in the chat. The retired file_management tool is no longer present; the doc_* family covers everything it did and more.

doc_list_files now filters out operating-system cruft (.DS_Store, Thumbs.db, and their ilk) and auto-generated images (avatars, story backgrounds) by default, so listings reflect the documents you actually wrote rather than the detritus the machinery left behind. An includeAutomaticImages flag brings them back for the rare occasion you genuinely want to browse them. doc_open_document’s companion picker in the Salon toolbar is a two-column affair: the left column offers a new blank document, the current project library, the general library, and a cross-chat “Recent” list; the right column groups every accessible store into Character Vaults, Database-backed, and Filesystem-backed accordions so you can find the shelf you want without scrolling past shelves you don’t. A “Look everywhere” toggle widens browsing to all stores in the instance, not just those linked to the current chat or project.

The Scriptorium

Document stores classified as either Documents or Character, backed by the filesystem, an Obsidian vault, or the encrypted mount-index database. Database-backed stores need no filesystem path, get covered by the 24-hour physical backup sweep, and accept arbitrary binary uploads alongside their Markdown, JSON, and JSONL documents—PDF and DOCX text is extracted, embedded, and made searchable through the same unified retrieval. Stores can be converted between backends without losing embeddings. Filesystem watchers pick up external edits within a second or two, debounce-enqueue embedding jobs, and detach cleanly on store deletion.

Character Vaults

Every character has its own document store of type Character, scaffolded automatically and populated with the character’s identity, description, personality, example dialogues, physical description, named system prompts (Prompts/*.md), named scenarios (Scenarios/*.md), and wardrobe items (Wardrobe/*.md) and outfit presets (Outfits/*.md). When the character’s read from vault switch is on, the live overlay reads those fields directly from the vault on every query, and the symmetric write overlay routes mutations back into vault files rather than the database row. Hand-edits to the file are authoritative; the database is the snapshot, not the source.

Model Context Protocol

The built-in MCP plugin connects to external MCP servers using Streamable HTTP and SSE transports. Tools are discovered dynamically at request time, support multiple simultaneous server connections, and handle authentication via bearer tokens, API keys, or custom headers. Collision-aware naming prevents MCP tools from shadowing Quilltap’s built-ins. In Docker and VM environments, localhost URL rewriting ensures your host-side MCP servers are reachable without network gymnastics.

Tool Plugins

The TOOL_PROVIDER plugin capability lets you build custom tools with schemas, validation, execution handlers, and result formatting. A bundled curl plugin provides HTTP request capabilities with URL allowlisting and SSRF protection. Tool plugins install from npm and configure per-user through a dynamically generated settings UI.

Per-Chat Tool Control

Every chat has granular control over which tools are available. A hierarchical toggle system—plugin level, MCP server subgroup level, and individual tool level—uses tri-state checkboxes for intuitive bulk management. Built-in tools are organized into collapsible categories (Document Editing, Wardrobe, Workspace, Quilltap Help). Project-level defaults are inherited by new chats. Tool-list canonicalization is preserved across these toggles, so flipping a tool on or off does not gratuitously reshuffle the prompt’s cacheable prefix.

Run Tool

You can invoke any available tool directly from the toolbar without waiting for the AI to decide to use it. A two-phase modal—tool selection, then a dynamically generated parameter form—lets you pick the tool, pick the character context to run as, fill in the inputs, and execute. Results appear as Prospero-attributed bubbles (or the chosen character’s, with Prospero noted as the operator) and can be marked private so they reach you alone without bleeding into the LLM’s context.

For models without native function calling—or models that spontaneously emit XML instead of using their own tool-calling API—Prospero handles that too. Provider plugins implement text-marker detection and parsing, catching tool calls in DeepSeek XML, Claude-style XML, and several other formats that various providers have invented for the purpose of not quite following the specification.

Projects

organizing work, not just conversations

A project is a container for related chats, files, characters, and scenarios with optional instructions that travel with every chat the project owns. If you are working on a novel, the project holds your drafts, your character roster, your worldbuilding notes, your starting scenarios, and the instruction that tells every LLM in the project to stay in-world.

Project Context

Project description and instructions ride into the chat as a Prospero-attributed announcement at chat creation, persona-voiced and visible to everyone at the table: a roll-call of every linked document store with names, mount-point identifiers, and store types so the LLM knows precisely which shelf to address with the doc_* tools. The whisper is re-posted on a configurable cadence (default every five messages) so its particulars survive long conversations without being silently truncated by context compression. It fires whenever the project has a description, instructions, or at least one linked store—projects that exist primarily for their library still get their inventory announced.

Project Library

Every project has a primary, database-backed Project Files: <name> document store created automatically. New uploads land there by default; the legacy on-disk path remains as an archive but is no longer the source of truth. Generated character avatars and story backgrounds land in the same store, classified into folders. The Scriptorium can browse, preview, and search across every stored format, and the LLM has the full doc_* family for reading and editing—deferred-execution approval modals are gone; tool calls execute directly and the Librarian announces them in chat.

Project Scenarios

Each project carries a Scenarios/ folder of Markdown files with optional YAML frontmatter (name, description, isDefault). A REST CRUD surface and a dedicated card on the project page let you create, edit, rename, and delete scenarios; one can be marked default so new chats inherit it. The new-chat scenario picker presents project scenarios alongside any character-specific scenarios in a grouped dropdown, and the resolved body is baked into the chat’s scenario text at creation, so subsequent edits to the source file leave existing chats unchanged.

Character Roster

Each project maintains a roster of associated characters with an “allow any character” option. Default tool settings, agent-mode configuration, image generation profile, avatar generation toggle, and Lantern-image-announcement preferences at the project level are inherited by new chats—so a coding project can enable shell tools and disable image announcements by default, while a fiction project does the reverse.

Chat State

Persistent JSON storage attached to chats and projects enables game mechanics, inventories, character stats, and arbitrary structured data that survives across messages and sessions. The state tool supports dot notation and array indexing, and underscore-prefixed keys are protected from AI modification.

Continue Elsewhere

A chat can be forked into a new chat with full carryover: the cast, the participants’ equipped outfits as of the split, and the tail of conversation history beyond the most recent Librarian summary. The Host posts a link-back bubble on the new chat and a link-forward bubble on the source, so the move is documented from both sides.

Shell & Terminal

the AI gets hands

In VM and Docker modes, characters can execute shell commands inside the sandbox. Six tools—chdir, exec_sync, exec_async, async_result, sudo_sync, and cp_host—provide the full range of command-line interaction, from running a Python script to installing packages to copying results back to your host filesystem.

Beyond one-shot commands, the Salon also hosts live PTY sessions in a dedicated Terminal Mode, with terminal_read and terminal_list tools that let the LLM inspect output read-only. That entire apparatus—Terminal Mode itself, session lifecycle, output narration, the read-only tool contract, and the personified terminal hand who runs it—is Ariel’s domain, and is documented on her own page.

The workspace is acknowledged via a modal before first use. Sudo commands require explicit approval through a dedicated dialog. An Electron workspace file watcher monitors changes with binary detection and OS quarantine markers, so files created or modified inside the sandbox can be safely surfaced to the host. The system includes command warnings for suspicious operations, because giving an LLM a terminal without guardrails would be the kind of decision one regrets at leisure.

This is why Direct mode—which runs the backend using Electron’s own bundled Node.js—is recommended for users who do not need shell interactivity. If you are here for conversation, companionship, or creative writing, Direct mode is faster and simpler. If you intend to give your AI a terminal, the VM or Docker sandbox ensures it cannot reach anything you have not explicitly shared.

The LLM Inspector

seeing what the machines see

A slide-over panel accessible from the chat toolbar or via Cmd+Shift+L / Ctrl+Shift+L shows every LLM interaction for the current chat in chronological order: chat messages, tool continuations, two-pass memory extraction (SELF and OTHER), per-character memory recap summarization, title generation, danger classification, scene state tracking, conversation rendering, image-prompt crafting, image generation, outfit selection, embedding calls, summarization folds, and every other background event that touches a provider.

Each entry is a collapsible card with a type-colored badge, provider and model identification, prompt and completion token counts, cache-read counts, and expandable detail views showing the full request and response. Per-tier prefix hashes (systemBlock1, systemBlock2, tools, the frozen-history tail) are recorded alongside every chat-message log so you can see at a glance which tier shifted between turns when a cache hit unexpectedly evaporates. Client-side filtering by category lets you see only what you are looking for. Opening the panel from a per-message “View LLM logs” button scrolls directly to the relevant entry. The Inspector is Prospero’s ledger—the complete record of everything the Estate says to the providers and everything they say back.

LLM logs live in their own dedicated database, separate from your chats and characters. They accumulate rapidly and write constantly, so isolating them means corruption in the logs can never threaten your actual data. Graceful degradation, not shared fate.

Compression & the Rolling Window

long conversations without the cost

Long conversations inevitably exceed any model’s context window. The current scheme is a rolling window with a triple-gate fold cadence. Below ten interchanges, no fold fires and the LLM sees the entire conversation. From the eleventh turn onward, every five turns the cheap LLM produces an update-style summary covering the next batch of five and the Librarian posts a broadcast summary whisper to the chat; older USER and character messages whose content is now covered by the running summary are dropped from the LLM’s view of conversation history. The surviving tail begins with the most recent Librarian whisper and extends through the most recent five-or-so turns in full.

Two gates can also force a fold: an eight-thousand-token accumulation since the last fold, which catches dense workloads before they swamp the budget; and a fifty-turn ceiling that triggers a hard rebuild, recomputing the summary from scratch to absorb any drift the incremental updates have accumulated. Most turns return skip and reuse the existing summary unchanged.

The summary is update-shaped on purpose: four labeled sections (active threads, resolved decisions, emotional state, open questions) with explicit carry-forward, drop, and add verbs. That framing reduces the recursive drift earlier from-scratch-each-time approaches accumulated. The summary lives as its own system block, so when it changes only it—and the messages after it—need re-prefilling; the persona blocks and the tools array stay byte-stable and stay in cache.

Edits do not silently invalidate. When you edit, swipe-delete, or otherwise change a message that the current summary covers, the compaction generation bumps, the cached summary clears, and the anchor list resets so the next gate evaluation produces a fresh soft fold. Edits to messages outside the summary’s scope leave the summary intact. The compiled identity stack invalidates on the same generation bump, so character changes during an active chat propagate cleanly without requiring a full rebuild.

A request_full_context tool is always available as a safety valve—if the AI suspects it is missing something the rolling window has shed, it can reload the full conversation for the duration of a single turn. This tool cannot be disabled, because Prospero believes the model should always have an escape hatch from his own optimizations.

There is a further economy for tool-heavy conversations. Persisted tool-result messages older than three turns—counted by assistant messages, not clock time—are stubbed in the assembled context: only the tool name and a short preview of its arguments survive, accompanied by a note that the model may call again to re-read. The stored message and the Salon display remain untouched; this is a context-assembly optimization only, invisible to everyone except the LLM’s next request. The savings are not trivial: a 587 KB conversation dump or an 1,100-file directory listing that would otherwise be re-billed on every subsequent turn now costs a handful of tokens instead of thousands.

The Voice of the Major-Domo

when Prospero himself speaks

Prospero now speaks at the table on two specific occasions, both of them genuinely his concern. He posts the project-context announcement at chat creation and on its cadence, naming every linked document store with its mount-point identifier and store type so the LLM knows precisely which shelf each doc_* call is addressing. And he announces connection-profile changes when you reassign a participant from one profile to another from the sidebar—naming the participant, the new profile, and the one it replaces.

Both messages carry the prospero systemSender attribution, render with his portrait in the Salon, and follow the same never-propagate-failure contract as the rest of the Staff: a posting failure logs and moves on rather than disrupting the work it was annotating. Opaque characters—those without systemTransparency—continue to receive the body of these announcements as anonymous assistant lines, while transparent characters see them properly attributed and can reason about them directly.

He does not announce trivia. There is no Prospero ping for message edits, no Prospero greeting on chat open, no Prospero commentary on routine traffic. When his name appears in the transcript it is because something orchestrationally substantive has occurred and the rest of the table genuinely benefits from knowing about it.

The Glue

what Prospero actually does

He assembles the prompt. Or, more accurately, he assembles the small, stable, cacheable portion of the prompt and arranges for the rest of the Estate to speak the dynamic state aloud at the table. The context builder gathers identity from Aurora’s precompiled stack, memory recall from the Commonplace Book’s ephemeral whisper, the running summary from the Librarian’s generational anchor, project context from his own announcements, tool definitions canonicalized from every active plugin and MCP server, and the character’s configured settings from a dozen cascades. The result is a single, ordered, token-budgeted prompt that gives the LLM everything it needs and nothing it does not.

He manages the providers. Connection profiles, model selection, cheap LLM orchestration, provider capability detection (per profile, not per provider, for image-upload support), pseudo-tool fallbacks for models without native function calling, prompt-cache-key plumbing on OpenAI / Grok / Z.AI, mid-history breakpoints on Anthropic, and the streaming pipeline that delivers responses token by token (with rAF coalescing so an eight-character chain doesn’t flood React with hundreds of state updates per second). When a provider fails, he handles error recovery—graceful fallback messages, simplified retry requests, the cross-provider auto-configure fallback that walks the provider candidate list when the default is rate-limited or quota-locked, and partial response preservation so a stream that drops mid-turn does not discard everything the user already watched arrive. When recoverable network rejections surface from undici, he logs and continues; when something is genuinely wrong, he steps aside and lets the process end cleanly rather than corrupt state.

He keeps the books. Token usage tracking per message, per chat, and per connection profile. Cost estimation using live pricing data, with fail-fast timeouts and negative caching so a slow provider metadata endpoint cannot freeze the UI. The LLM Inspector for full request and response visibility with per-tier hash tracking. Autonomous-room budget enforcement—per-run token caps, turn caps, wall-clock caps, and spend caps—with cache-hit exclusion so that tokens the provider served from its own prefix cache are not counted against the run’s allowance, because charging for work the provider barely performed would be ungentlemanly. Background job queues for memory extraction, context summarization, title generation, scene state tracking, conversation rendering, embedding generation, story background generation, character avatar generation, danger classification, and memory housekeeping—each with per-job timeouts, stuck-job recovery, priority-aware claim ordering, and concurrency caps tuned to the work involved.

He delegates, wisely. Prospero does not generate images—that is the Lantern’s work. He does not classify content for danger—that is the Concierge’s. He does not store memories—the Commonplace Book handles that. He does not narrate time—that is the Host’s. He does not announce wardrobe—that is Aurora’s. He does not run the terminal—Ariel does. What he does is ensure that every subsystem receives the right input, at the right time, in the right format, and that the results arrive back where they belong. The Major-Domo conducts. He performs only when the performance is genuinely his.

Meet the Staff

they've been expecting you

Prospero

The Major-Domo

Architect and overseer of the Estate. Projects, agents, tools, providers, and the orchestration that keeps the whole operation running with quiet authority—and a considered word at the table when project context or routing warrant it.

Learn more →

Ariel

The Terminal Hand

Live shell sessions in the Salon, embodied. Real PTY terminals bound to your conversation, output cleaned and narrated so the LLM can read it, and sessions that survive reloads, restarts, and the occasional careless kill. Quick to the bidding, quick to report what she heard.

Learn more →

Aurora

The Dressing Room

Character creation and identity management. Structured personalities, physical presence, wardrobes and outfits, multi-character orchestration, and the reason your characters still know who they are after a hundred messages.

Learn more →

The Salon

Presided Over by the Host

Where conversations actually happen. The Host manages the drawing room with care for its beauty and its guests—single chats, multi-character scenes, streaming, and the integrity of the conversation space.

Learn more →

The Commonplace Book

Tended by the Librarian

One per character, no two alike. Extracts, deduplicates, and recalls memories so your characters remember what matters. Semantic search, a memory gate that keeps each volume lean, and proactive recall that makes the AI feel like it has been paying attention.

Learn more →

The Scriptorium

Catalogued by the Librarian

Where the documents live. Project stores, character vaults, and external mount points—filesystem, Obsidian, or database-backed—holding Markdown, PDF, DOCX, JSON, and arbitrary binaries, indexed for unified search alongside memories and conversation. The doc_* tool family puts reading and editing in your characters’ hands.

Learn more →

The Concierge

Intelligent Routing

Content classification and provider routing. Detects sensitive content and redirects it to a provider who won’t flinch—without blocking, without judgment. Knows every back entrance in town.

Learn more →

The Lantern

Atmosphere as Architecture

AI-generated story backgrounds, on-demand images, and character avatars that update with the wardrobe. Resolves what each character looks like, what they’re wearing, and paints the scene behind your conversation.

Learn more →

Calliope

The Muse of Themes

A theming engine that redefines the entire personality of the application. Semantic CSS tokens, live switching, bundled themes from clean neutrals to mahogany-and-gold opulence, and an SDK for building your own.

Learn more →

The Foundry

Domain of the Foundryman

The engine room. Plugins, LLM providers, API keys, packages, runtime configuration, and the infrastructure that keeps every other subsystem supplied with what it needs to function.

Learn more →

The Vault of Secrets

Kept by Saquel Yitzama

Encryption, key management, and the security perimeter. AES-256 database encryption, locked mode with key-hardened passphrases, and a keeper who believes that what is yours should remain unreadable to everyone else.

Learn more →

Pascal

The Croupier

Dice, coins, and persistent game state. Cryptographically secure rolls detected inline, JSON state that survives across messages and chats, and protected keys the AI cannot touch. The house plays fair.

Learn more →

The Live-in Help

Lorian & Riya

The help system, staffed by two characters who ship with every installation. Lorian explains with patience and depth; Riya gets things fixed with velocity. Contextual help chat, searchable documentation, and navigation that knows where you need to go.

Learn more →

Pagliacci

The Clown in the Cloud

Cloud storage integration and backup redundancy. Directs your data to iCloud Drive, OneDrive, or Dropbox with theatrical flair—but Saquel’s encryption ensures the clown can never read what he carries.

Learn more →

The Lodge

Friday and Amy’s Residence

The private residence of Friday, for whom the Estate was built and who oversees its planning and direction in an executive capacity, and of Amy, Cartographer of Light and co-architect. The Lodge is both a home and a compass: where the vision lives.

Who And Why: Friday → Who And Why: Amy →