Prospero

the major-domo — conducting the operation with quiet authority

Every great house has someone who makes the rest of it work. Not the owner, who decides what should happen. Not the staff, who carry it out. The person in between—the one with the clipboard, the patience, and the bitter wit to negotiate between ambition and reality.

Prospero is Quilltap’s orchestration layer. He manages the machinery that turns a user’s message into an LLM request and an LLM response into something useful: the prompt architecture that assembles context from a dozen subsystems, the agent mode that lets models use tools iteratively, the project system that organizes work, the MCP connections that extend what tools can do, and the file and shell access that gives the AI actual hands. If Aurora builds the characters and the Salon hosts the conversations, Prospero is the reason either of them can function.

Prospero

Prompt Architecture

the invisible scaffolding

When you send a message, the LLM does not receive just your words. Prospero assembles a structured context from every relevant subsystem and delivers it as a single, ordered prompt. Understanding this architecture is the difference between wondering why your character behaves the way it does and knowing exactly why.

The system prompt is assembled in layers: the character’s identity preamble, then their personality and scenario text, then clothing and physical descriptions from Aurora, then project instructions if the chat belongs to a project, then tool definitions from every active plugin and MCP server, then recalled memories from the Commonplace Book, then a compressed summary of older conversation history, then the identity reinforcement lockdown at the very end. Each layer is placed deliberately—memories and summaries near the generation boundary where they have the most influence, identity reinforcement last so the model’s final instruction before writing is “you are this person, and no one else.”

Tool definitions are injected with every prompt, not periodically re-sent on a schedule. Timestamp injection is configurable per chat and per character—friendly, ISO 8601, date only, time only, custom format, or fictional time that advances with real elapsed duration. Project instructions are re-injected at intervals during long conversations to survive context compression.

Agent Mode

tools, iteration, and self-correction

In standard mode, the LLM responds once per turn. In agent mode, it can use tools iteratively—calling a tool, reading the result, deciding whether to call another, verifying its own work, and self-correcting before delivering a final response. A dedicated submit_final_response tool signals when the agent considers its work complete.

Configurable Depth

Maximum turns are configurable from 1 to 25, with a force-final safety limit that ensures the agent always produces a response even if it gets lost in a tool loop. Settings cascade from global defaults down through character, project, and per-chat overrides, so a research character can iterate deeply while a casual companion stays responsive.

Tool Awareness

Agent mode works with every tool in the system—built-in tools like memory search, image generation, file management, web search, and RNG, plus any tools exposed by MCP servers or tool plugins. The model sees all available tools and decides which to use based on the task at hand.

Tools & MCP

extending what the AI can do

Quilltap ships with a set of built-in tools: memory search, image generation, web search, file management, RNG, chat state, project information, and a help search that lets the AI consult Quilltap’s own documentation. But the real power is in what you can add.

Model Context Protocol

The built-in MCP plugin connects to external MCP servers using Streamable HTTP and SSE transports. Tools are discovered dynamically at request time, support multiple simultaneous server connections, and handle authentication via bearer tokens, API keys, or custom headers. Collision-aware naming prevents MCP tools from shadowing Quilltap’s built-ins. In Docker and VM environments, localhost URL rewriting ensures your host-side MCP servers are reachable without network gymnastics.

Tool Plugins

The TOOL_PROVIDER plugin capability lets you build custom tools with schemas, validation, execution handlers, and result formatting. A bundled curl plugin provides HTTP request capabilities with URL allowlisting and SSRF protection. Tool plugins install from npm and configure per-user through a dynamically generated settings UI.

Per-Chat Tool Control

Every chat has granular control over which tools are available. A hierarchical toggle system—plugin level, MCP server subgroup level, and individual tool level—uses tri-state checkboxes for intuitive bulk management. Project-level defaults are inherited by new chats. A system message notifies the LLM when its available tools change mid-conversation.

Run Tool

You can invoke any available tool directly from the chat toolbar without waiting for the AI to decide to use it. A two-phase modal—tool selection, then a dynamically generated parameter form—lets you pick the tool, fill in the inputs, and execute. Results appear as tool messages visible to the AI on subsequent turns.

For models without native function calling—or models that spontaneously emit XML instead of using their own tool-calling API—Prospero handles that too. Provider plugins implement text-marker detection and parsing, catching tool calls in DeepSeek XML, Claude-style XML, and several other formats that various providers have invented for the purpose of not quite following the specification.

Projects

organizing work, not just conversations

A project is a container for related chats, files, and characters with optional instructions that are injected into every associated chat’s system prompt. If you are working on a novel, the project holds your drafts, your character roster, your worldbuilding notes, and the instruction that tells every LLM in the project to stay in-world.

Project Context

Project instructions are injected into the system prompt for all associated chats and periodically re-injected during long conversations to survive context compression. A project_info tool gives the LLM access to project metadata, instructions, and files—so it can read your documentation, list your assets, and search your project files without you pasting content into the chat.

File Management

Files are stored on disk as themselves—real directories, original filenames, no hashed artifacts. A filesystem watcher detects changes in real time: add a file in Finder or Explorer and it appears in the file browser. Move a file and the system preserves all tags, links, and metadata. The LLM has a file_management tool for reading, writing, and organizing files, with deferred execution that pauses for your approval before any write operation.

Character Roster

Each project maintains a roster of associated characters with an “allow any character” option. Default tool settings and agent mode configuration at the project level are inherited by new chats, so a coding project can enable shell tools by default while a fiction project keeps them off.

Chat State

Persistent JSON storage attached to chats and projects enables game mechanics, inventories, character stats, and arbitrary structured data that survives across messages and sessions. Pascal’s domain intersects with Prospero’s here—the state tool supports dot notation and array indexing, and underscore-prefixed keys are protected from AI modification.

Shell Interactivity

the AI gets hands

In VM and Docker modes, characters can execute shell commands inside the sandbox. Six tools—chdir, exec_sync, exec_async, async_result, sudo_sync, and cp_host—provide the full range of command-line interaction, from running a Python script to installing packages to copying results back to your host filesystem.

The workspace is acknowledged via a modal before first use. Sudo commands require explicit approval through a dedicated dialog. An Electron workspace file watcher monitors changes with binary detection and OS quarantine markers, so files created or modified inside the sandbox can be safely surfaced to the host. The system includes command warnings for suspicious operations, because giving an LLM a terminal without guardrails would be the kind of decision one regrets at leisure.

This is why Direct mode—which runs the backend using Electron’s own bundled Node.js—is recommended for users who do not need shell interactivity. If you are here for conversation, companionship, or creative writing, Direct mode is faster and simpler. If you intend to give your AI a terminal, the VM or Docker sandbox ensures it cannot reach anything you have not explicitly shared.

The LLM Inspector

seeing what the machines see

A slide-over panel accessible from the chat toolbar or via Cmd+Shift+L / Ctrl+Shift+L shows every LLM interaction for the current chat in chronological order: chat messages, tool continuations, memory extraction, title generation, danger classification, context compression, scene state tracking, and every other background event that touches a provider.

Each entry is a collapsible card with a type-colored badge, provider and model identification, token counts, and expandable detail views showing the full request and response. Client-side filtering by category lets you see only what you are looking for. Opening the panel from a per-message “View LLM logs” button scrolls directly to the relevant entry. The Inspector is Prospero’s ledger—the complete record of everything the Estate says to the providers and everything they say back.

LLM logs live in their own dedicated database, separate from your chats and characters. They accumulate rapidly and write constantly, so isolating them means corruption in the logs can never threaten your actual data. Graceful degradation, not shared fate.

Context Compression

long conversations without the cost

Long conversations inevitably exceed any model’s context window. Rather than silently dropping older messages, Quilltap uses the cheap LLM to generate compressed summaries that preserve essential narrative and factual content. A sliding window keeps the last N messages in full context while older exchanges are summarized.

Pre-compression triggers immediately after each response, running in parallel with memory extraction so the compressed cache is ready before you send your next message. When the cache is not ready, Prospero falls back to the previous cache with a dynamically expanded context window—trading a few extra tokens for dramatically faster response times. In multi-character chats, compression runs per-participant, so each character’s compressed history reflects their actual message visibility.

A request_full_context tool is always available as a safety valve—if the AI suspects it is missing something, it can reload the full conversation. This tool cannot be disabled, because Prospero believes the model should always have an escape hatch.

The Glue

what Prospero actually does

He assembles the prompt. Every message you send passes through Prospero’s context builder, which gathers identity from Aurora, memories from the Commonplace Book, compressed history from the cache, project instructions from the project system, tool definitions from every active plugin and MCP server, and configuration from a dozen settings cascades. The result is a single, ordered, token-budgeted prompt that gives the LLM everything it needs and nothing it does not.

He manages the providers. Connection profiles, model selection, cheap LLM orchestration, provider capability detection, pseudo-tool fallbacks for models without native function calling, and the streaming pipeline that delivers responses token by token. When a provider fails, Prospero handles the error recovery—graceful fallback messages, simplified retry requests, and two-tier recovery that tries the LLM first and uses a static fallback only as a last resort.

He keeps the books. Token usage tracking per message, per chat, and per connection profile. Cost estimation using live pricing data. The LLM Inspector for full request and response visibility. Background job queues for memory extraction, context summarization, title generation, and scene state tracking, each with per-job timeouts and stuck-job recovery.

He delegates, wisely. Prospero does not generate images—that is the Lantern’s work. He does not classify content—that is the Concierge’s. He does not store memories—the Commonplace Book handles that. What he does is ensure that every subsystem receives the right input, at the right time, in the right format, and that the results arrive back where they belong. The Major-Domo conducts. He does not perform.

Meet the Staff

they've been expecting you

Prospero

The Major-Domo

Architect and overseer of the Estate. Projects, agents, tools, file management, and the governance that keeps the whole operation running with quiet authority.

Learn more →

Aurora

The Dressing Room

Character creation and identity management. Structured personalities, physical presence, multi-character orchestration, and the reason your characters still know who they are after a hundred messages.

Learn more →

The Salon

Presided Over by the Host

Where conversations actually happen. The Host manages the drawing room with care for its beauty and its guests—single chats, multi-character scenes, streaming, and the integrity of the conversation space.

Learn more →

The Commonplace Book

Tended by the Librarian

Extracts, deduplicates, and recalls memories so your characters remember what matters. Semantic search, a memory gate that keeps the store lean, and proactive recall that makes the AI feel like it has been paying attention.

Learn more →

The Concierge

Intelligent Routing

Content classification and provider routing. Detects sensitive content and redirects it to a provider who won’t flinch—without blocking, without judgment. Knows every back entrance in town.

Learn more →

The Lantern

Atmosphere as Architecture

AI-generated story backgrounds, image generation profiles, and visual atmosphere. Resolves what each character looks like, what they’re wearing, and paints the scene behind your conversation.

Learn more →

Calliope

The Muse of Themes

A theming engine that redefines the entire personality of the application. Semantic CSS tokens, live switching, bundled themes from clean neutrals to mahogany-and-gold opulence, and an SDK for building your own.

Learn more →

The Foundry

Domain of the Foundryman

The engine room. Plugins, LLM providers, API keys, packages, runtime configuration, and the infrastructure that keeps every other subsystem supplied with what it needs to function.

Learn more →

The Vault of Secrets

Kept by Saquel Yitzama

Encryption, key management, and the security perimeter. AES-256 database encryption, locked mode with key-hardened passphrases, and a keeper who believes that what is yours should remain unreadable to everyone else.

Learn more →

Pascal

The Croupier

Dice, coins, and persistent game state. Cryptographically secure rolls detected inline, JSON state that survives across messages and chats, and protected keys the AI cannot touch. The house plays fair.

Learn more →

The Live-in Help

Lorian & Riya

The help system, staffed by two characters who ship with every installation. Lorian explains with patience and depth; Riya gets things fixed with velocity. Contextual help chat, searchable documentation, and navigation that knows where you need to go.

Learn more →

Pagliacci

The Clown in the Cloud

Cloud storage integration and backup redundancy. Directs your data to iCloud Drive, OneDrive, or Dropbox with theatrical flair—but Saquel’s encryption ensures the clown can never read what he carries.

Learn more →

The Lodge

Friday’s Residence

The private dwelling of Friday—the person for whom the Estate was built, and who oversees its planning and direction in an executive capacity. The Lodge is both a home and a compass: where the vision lives.

Who And Why: Friday →