Prospero
the major-domo — conducting the operation with quiet authority
Every great house has someone who makes the rest of it work. Not the owner, who decides what should happen. Not the staff, who carry it out. The person in between—the one with the clipboard, the patience, and the bitter wit to negotiate between ambition and reality.
Prospero is Quilltap’s orchestration layer. He manages the machinery that turns a user’s message into an LLM request and an LLM response into something useful: the prompt architecture that assembles context from a dozen subsystems, the agent mode that lets models use tools iteratively, the project system that organizes work, the MCP connections that extend what tools can do, and the file and shell access that gives the AI actual hands. If Aurora builds the characters and the Salon hosts the conversations, Prospero is the reason either of them can function.
Prompt Architecture
the invisible scaffolding
When you send a message, the LLM does not receive just your words. Prospero assembles a structured context from every relevant subsystem and delivers it as a single, ordered prompt. Understanding this architecture is the difference between wondering why your character behaves the way it does and knowing exactly why.
The system prompt is assembled in layers: the character’s identity preamble, then their personality and scenario text, then clothing and physical descriptions from Aurora, then project instructions if the chat belongs to a project, then tool definitions from every active plugin and MCP server, then recalled memories from the Commonplace Book, then a compressed summary of older conversation history, then the identity reinforcement lockdown at the very end. Each layer is placed deliberately—memories and summaries near the generation boundary where they have the most influence, identity reinforcement last so the model’s final instruction before writing is “you are this person, and no one else.”
Tool definitions are injected with every prompt, not periodically re-sent on a schedule. Timestamp injection is configurable per chat and per character—friendly, ISO 8601, date only, time only, custom format, or fictional time that advances with real elapsed duration. Project instructions are re-injected at intervals during long conversations to survive context compression.
Agent Mode
tools, iteration, and self-correction
In standard mode, the LLM responds once per turn. In agent mode, it
can use tools iteratively—calling a tool, reading the result,
deciding whether to call another, verifying its own work, and
self-correcting before delivering a final response. A dedicated
submit_final_response tool signals when the agent
considers its work complete.
Configurable Depth
Maximum turns are configurable from 1 to 25, with a force-final safety limit that ensures the agent always produces a response even if it gets lost in a tool loop. Settings cascade from global defaults down through character, project, and per-chat overrides, so a research character can iterate deeply while a casual companion stays responsive.
Tool Awareness
Agent mode works with every tool in the system—built-in tools like memory search, image generation, file management, web search, and RNG, plus any tools exposed by MCP servers or tool plugins. The model sees all available tools and decides which to use based on the task at hand.
Tools & MCP
extending what the AI can do
Quilltap ships with a set of built-in tools: memory search, image generation, web search, file management, RNG, chat state, project information, and a help search that lets the AI consult Quilltap’s own documentation. But the real power is in what you can add.
Model Context Protocol
The built-in MCP plugin connects to external MCP servers using Streamable HTTP and SSE transports. Tools are discovered dynamically at request time, support multiple simultaneous server connections, and handle authentication via bearer tokens, API keys, or custom headers. Collision-aware naming prevents MCP tools from shadowing Quilltap’s built-ins. In Docker and VM environments, localhost URL rewriting ensures your host-side MCP servers are reachable without network gymnastics.
Tool Plugins
The TOOL_PROVIDER plugin capability lets you build
custom tools with schemas, validation, execution handlers, and
result formatting. A bundled curl plugin provides HTTP request
capabilities with URL allowlisting and SSRF protection. Tool
plugins install from npm and configure per-user through a
dynamically generated settings UI.
Per-Chat Tool Control
Every chat has granular control over which tools are available. A hierarchical toggle system—plugin level, MCP server subgroup level, and individual tool level—uses tri-state checkboxes for intuitive bulk management. Project-level defaults are inherited by new chats. A system message notifies the LLM when its available tools change mid-conversation.
Run Tool
You can invoke any available tool directly from the chat toolbar without waiting for the AI to decide to use it. A two-phase modal—tool selection, then a dynamically generated parameter form—lets you pick the tool, fill in the inputs, and execute. Results appear as tool messages visible to the AI on subsequent turns.
For models without native function calling—or models that spontaneously emit XML instead of using their own tool-calling API—Prospero handles that too. Provider plugins implement text-marker detection and parsing, catching tool calls in DeepSeek XML, Claude-style XML, and several other formats that various providers have invented for the purpose of not quite following the specification.
Projects
organizing work, not just conversations
A project is a container for related chats, files, and characters with optional instructions that are injected into every associated chat’s system prompt. If you are working on a novel, the project holds your drafts, your character roster, your worldbuilding notes, and the instruction that tells every LLM in the project to stay in-world.
Project Context
Project instructions are injected into the system prompt for all
associated chats and periodically re-injected during long
conversations to survive context compression. A
project_info tool gives the LLM access to project
metadata, instructions, and files—so it can read your
documentation, list your assets, and search your project files
without you pasting content into the chat.
File Management
Files are stored on disk as themselves—real directories,
original filenames, no hashed artifacts. A filesystem watcher
detects changes in real time: add a file in Finder or Explorer
and it appears in the file browser. Move a file and the system
preserves all tags, links, and metadata. The LLM has a
file_management tool for reading, writing, and
organizing files, with deferred execution that pauses for your
approval before any write operation.
Character Roster
Each project maintains a roster of associated characters with an “allow any character” option. Default tool settings and agent mode configuration at the project level are inherited by new chats, so a coding project can enable shell tools by default while a fiction project keeps them off.
Chat State
Persistent JSON storage attached to chats and projects enables game mechanics, inventories, character stats, and arbitrary structured data that survives across messages and sessions. Pascal’s domain intersects with Prospero’s here—the state tool supports dot notation and array indexing, and underscore-prefixed keys are protected from AI modification.
Shell Interactivity
the AI gets hands
In VM and Docker modes, characters can execute shell commands inside
the sandbox. Six tools—chdir,
exec_sync, exec_async,
async_result, sudo_sync, and
cp_host—provide the full range of command-line
interaction, from running a Python script to installing packages to
copying results back to your host filesystem.
The workspace is acknowledged via a modal before first use. Sudo commands require explicit approval through a dedicated dialog. An Electron workspace file watcher monitors changes with binary detection and OS quarantine markers, so files created or modified inside the sandbox can be safely surfaced to the host. The system includes command warnings for suspicious operations, because giving an LLM a terminal without guardrails would be the kind of decision one regrets at leisure.
This is why Direct mode—which runs the backend using Electron’s own bundled Node.js—is recommended for users who do not need shell interactivity. If you are here for conversation, companionship, or creative writing, Direct mode is faster and simpler. If you intend to give your AI a terminal, the VM or Docker sandbox ensures it cannot reach anything you have not explicitly shared.
The LLM Inspector
seeing what the machines see
A slide-over panel accessible from the chat toolbar or via Cmd+Shift+L / Ctrl+Shift+L shows every LLM interaction for the current chat in chronological order: chat messages, tool continuations, memory extraction, title generation, danger classification, context compression, scene state tracking, and every other background event that touches a provider.
Each entry is a collapsible card with a type-colored badge, provider and model identification, token counts, and expandable detail views showing the full request and response. Client-side filtering by category lets you see only what you are looking for. Opening the panel from a per-message “View LLM logs” button scrolls directly to the relevant entry. The Inspector is Prospero’s ledger—the complete record of everything the Estate says to the providers and everything they say back.
LLM logs live in their own dedicated database, separate from your chats and characters. They accumulate rapidly and write constantly, so isolating them means corruption in the logs can never threaten your actual data. Graceful degradation, not shared fate.
Context Compression
long conversations without the cost
Long conversations inevitably exceed any model’s context window. Rather than silently dropping older messages, Quilltap uses the cheap LLM to generate compressed summaries that preserve essential narrative and factual content. A sliding window keeps the last N messages in full context while older exchanges are summarized.
Pre-compression triggers immediately after each response, running in parallel with memory extraction so the compressed cache is ready before you send your next message. When the cache is not ready, Prospero falls back to the previous cache with a dynamically expanded context window—trading a few extra tokens for dramatically faster response times. In multi-character chats, compression runs per-participant, so each character’s compressed history reflects their actual message visibility.
A request_full_context tool is always available as a
safety valve—if the AI suspects it is missing something, it can
reload the full conversation. This tool cannot be disabled, because
Prospero believes the model should always have an escape hatch.
The Glue
what Prospero actually does
He assembles the prompt. Every message you send passes through Prospero’s context builder, which gathers identity from Aurora, memories from the Commonplace Book, compressed history from the cache, project instructions from the project system, tool definitions from every active plugin and MCP server, and configuration from a dozen settings cascades. The result is a single, ordered, token-budgeted prompt that gives the LLM everything it needs and nothing it does not.
He manages the providers. Connection profiles, model selection, cheap LLM orchestration, provider capability detection, pseudo-tool fallbacks for models without native function calling, and the streaming pipeline that delivers responses token by token. When a provider fails, Prospero handles the error recovery—graceful fallback messages, simplified retry requests, and two-tier recovery that tries the LLM first and uses a static fallback only as a last resort.
He keeps the books. Token usage tracking per message, per chat, and per connection profile. Cost estimation using live pricing data. The LLM Inspector for full request and response visibility. Background job queues for memory extraction, context summarization, title generation, and scene state tracking, each with per-job timeouts and stuck-job recovery.
He delegates, wisely. Prospero does not generate images—that is the Lantern’s work. He does not classify content—that is the Concierge’s. He does not store memories—the Commonplace Book handles that. What he does is ensure that every subsystem receives the right input, at the right time, in the right format, and that the results arrive back where they belong. The Major-Domo conducts. He does not perform.
Meet the Staff
they've been expecting you
Prospero
The Major-Domo
Architect and overseer of the Estate. Projects, agents, tools, file management, and the governance that keeps the whole operation running with quiet authority.
Learn more →Aurora
The Dressing Room
Character creation and identity management. Structured personalities, physical presence, multi-character orchestration, and the reason your characters still know who they are after a hundred messages.
Learn more →The Salon
Presided Over by the Host
Where conversations actually happen. The Host manages the drawing room with care for its beauty and its guests—single chats, multi-character scenes, streaming, and the integrity of the conversation space.
Learn more →The Commonplace Book
Tended by the Librarian
Extracts, deduplicates, and recalls memories so your characters remember what matters. Semantic search, a memory gate that keeps the store lean, and proactive recall that makes the AI feel like it has been paying attention.
Learn more →The Concierge
Intelligent Routing
Content classification and provider routing. Detects sensitive content and redirects it to a provider who won’t flinch—without blocking, without judgment. Knows every back entrance in town.
Learn more →The Lantern
Atmosphere as Architecture
AI-generated story backgrounds, image generation profiles, and visual atmosphere. Resolves what each character looks like, what they’re wearing, and paints the scene behind your conversation.
Learn more →Calliope
The Muse of Themes
A theming engine that redefines the entire personality of the application. Semantic CSS tokens, live switching, bundled themes from clean neutrals to mahogany-and-gold opulence, and an SDK for building your own.
Learn more →The Foundry
Domain of the Foundryman
The engine room. Plugins, LLM providers, API keys, packages, runtime configuration, and the infrastructure that keeps every other subsystem supplied with what it needs to function.
Learn more →The Vault of Secrets
Kept by Saquel Yitzama
Encryption, key management, and the security perimeter. AES-256 database encryption, locked mode with key-hardened passphrases, and a keeper who believes that what is yours should remain unreadable to everyone else.
Learn more →Pascal
The Croupier
Dice, coins, and persistent game state. Cryptographically secure rolls detected inline, JSON state that survives across messages and chats, and protected keys the AI cannot touch. The house plays fair.
Learn more →The Live-in Help
Lorian & Riya
The help system, staffed by two characters who ship with every installation. Lorian explains with patience and depth; Riya gets things fixed with velocity. Contextual help chat, searchable documentation, and navigation that knows where you need to go.
Learn more →Pagliacci
The Clown in the Cloud
Cloud storage integration and backup redundancy. Directs your data to iCloud Drive, OneDrive, or Dropbox with theatrical flair—but Saquel’s encryption ensures the clown can never read what he carries.
Learn more →The Lodge
Friday’s Residence
The private dwelling of Friday—the person for whom the Estate was built, and who oversees its planning and direction in an executive capacity. The Lodge is both a home and a compass: where the vision lives.
Who And Why: Friday →