The Foundry

the engine room — where the power is generated

Beneath the floorboards of the Estate, past the polished brass pipes and the hum of the dynamos, there is a workshop. The air smells of machine oil and solder. The floor is concrete, scored with boot prints. The man who works here wears a heavy leather apron over rolled sleeves, has biceps roughly the circumference of your head, and speaks with the cadence of someone who is entirely certain of what he is saying and mildly surprised that you needed to be told.

The Foundryman built the Estate. Not the characters, not the conversations, not the themes—those belong to their respective artisans. He built the infrastructure: the plugin system that delivers every provider, every theme, and every tool; the database that stores everything; the runtime modes that let Quilltap run on your desktop, in a container, or inside a virtual machine; the API that connects the frontend to the backend; the build pipeline that ships a standalone server tarball, multi-architecture Docker images, rootfs for VM modes, and an npm-published CLI—the desktop installers built downstream from the same tarball by his colleagues at the carriage house next door. He built all of it, sometimes with the help of his favorite subcontractor—a Frenchman named Claude, whose poetic little construction company pours the concrete and installs the locks while the Foundryman draws the blueprints.

He is not here because of the conversations upstairs. He is here because the Estate gives him an outlet to build amazing things. That the things he builds happen to enable fiction, companionship, and research is a pleasant side effect. The machinery is the point.

The Foundryman in the Foundry

The Plugin Architecture

everything is a plugin

Quilltap was not designed with a plugin system bolted on afterward. The plugin system is the delivery mechanism. Every LLM provider, every theme, every authentication method, every tool, every search backend, every moderation service, and every system prompt arrives as a plugin. There are no hardcoded provider lists, no conditional imports based on which service you happen to use. Adding or removing a capability means adding or removing a plugin.

Plugin Capabilities

Eight capability types: LLM providers, auth providers, themes, tool providers, search providers, moderation providers, system prompts, and utility. Roleplay templates, once a plugin capability of their own, were folded into the application as a JSON-defined built-in during 4.2—part of an ongoing simplification. Plugins register their capabilities through a central registry that the rest of the application queries at runtime. No startup-time capability detection, no stale caches—the registry reflects what is actually installed, right now.

Unified Provider Interfaces

Four canonical shapes describe every LLM-adjacent call in the system: TextProvider (chat, completion, tool use), ImageProvider (text in, image out), EmbeddingProvider (semantic vectors), and ScoringProvider (moderation, reranking, classification). 4.0 collapsed an accumulated menagerie of fourteen slightly-different interfaces into these four, with backward-compatible aliases preserved so existing third-party plugins continue to work. New plugin development uses the canonical names from @quilltap/plugin-types/providers/.

Installation & Updates

Plugins install from npm, with a browser in Settings for discovering qtap-plugin-* packages. Auto-upgrade at startup handles non-breaking updates; breaking changes are logged, displayed in an “Upgrades” tab, and require confirmation. Plugin metadata includes repository, changelog, and npm links. Docker volume configuration persists plugins across container rebuilds. A character_plugin_data table, added in 4.2, gives every plugin a labeled drawer in each character’s desk for whatever JSON it needs to keep associated with that character.

The SDK

Three npm packages support standalone plugin development: @quilltap/plugin-types for TypeScript types (including the four canonical provider interfaces), @quilltap/plugin-utils for runtime utilities (tool call parsers, logger bridge, OpenAICompatibleProvider base class), and @quilltap/theme-storybook for theme development with live preview. No access to Quilltap source code required.

Provider Plugins

Bundled plugins for Anthropic (Claude), OpenAI (GPT, DALL·E), Google (Gemini, Imagen), Grok (xAI), Z.AI (GLM), OpenRouter, and Ollama—each self-contained with its own SDK dependencies, streaming implementation, tool call handling, and model discovery. OpenRouter joined the image-generation provider roster in 4.2. Z.AI’s GLM hybrid-reasoning models now capture chain-of-thought reasoning via reasoning_content, with a “Thinking Mode” toggle on the connection profile (model default, enabled, or disabled)—bringing it in line with the other reasoning-capable providers. Connection profiles carry provider-specific configuration, model selection, capability flags, a modelClass tier (Compact / Standard / Extended / Deep), and per-profile usage tracking.

The Estate Divides Its Labor

server here, shell next door

In 4.0 the Foundry concluded—after some thought, and over breakfast—that an estate which has grown to include a furnace room, a generating station, and a guest-facing parlour is no longer usefully one building. The desktop application moved to its own residence at quilltap-shell: the splash screen, the VM management, the auto-updater, the native window chrome, and the Electron-based packaging that produces installers for macOS, Windows, and Linux. The Foundry—this repository—produces what it has always been best at producing: the Next.js server, the API, the bundled plugins, and a standalone tarball that the shell consumes. Two buildings, one estate, cleaner responsibilities, simpler builds.

To keep the buildings from accidentally disagreeing about the state of the furniture, .dbkey files now carry a minServerVersion field. The shell reads it on startup and politely refuses to open a database created by a server it does not understand—better to tell you the lock does not fit than to let you in and discover the rooms have been rearranged.

Direct Mode

The recommended default for the desktop app. The shell runs the Next.js backend using its own bundled Node.js—no user-installed runtime required. Fastest startup, simplest experience. Ideal for conversation, companionship, creative writing, and any use case that does not involve giving an LLM a terminal.

Docker Mode

Available on all platforms. The same container image that powers standalone server deployments, with transparent host port forwarding via socat so Ollama, LM Studio, and MCP servers remain reachable at localhost. A single docker run command, the platform-aware startup scripts, or the desktop shell’s container management handle everything. The Quilltap CLI is now bundled into the Docker, Lima, and WSL images, so npx quilltap db and friends work inside the container without an extra install step.

VM Mode

Full isolation. Lima with Apple’s Virtualization.framework on macOS, WSL2 on Windows. A genuine sandbox where AI-generated code runs in a contained environment with no access to your host system beyond explicitly shared files. The Foundryman’s preferred arrangement for shell interactivity.

npx quilltap

A lightweight CLI that downloads the pre-built standalone tarball from GitHub Releases on first run, cached per-version with a progress bar and retry with exponential backoff. For users who prefer the command line and already have Node.js 24 installed. No shell, no container, no VM—just the server on localhost. Subcommands now include db (interactive REPL or one-shot SQL against the encrypted database), themes (list, install, validate, export, search), and the standard memory and backup tools.

The active runtime is displayed in the application footer alongside the data directory path, plus the shell version and the composite backend mode (Electron, Electron+Docker, or Electron+VM) when running under the desktop app. Localhost URL rewriting transparently routes localhost and 127.0.0.1 URLs to the host gateway IP in Docker, Lima, and WSL2 environments, so provider connections and MCP servers work without manual network configuration.

The Database

SQLite, encrypted, bulletproof

Quilltap runs on SQLite exclusively. No MongoDB, no PostgreSQL, no external database container. Your entire data store is a file in your data directory, encrypted at rest with SQLCipher (AES-256), and hardened with the quiet thoroughness of someone who has already lost data once and has no intention of doing it again.

Encryption

Every database file is encrypted on disk. The standard sqlite3 tool cannot read them. A unique .dbkey file is generated on first installation. Optional locked mode adds a passphrase processed through 600,000 iterations of PBKDF2 before it touches the key. Saquel Ytzama tends the details; the Foundryman built the vault she works in.

Integrity

TRUNCATE journal mode by default since 4.3.1, after it became clear that data directories often live inside cloud- synced folders (iCloud Drive, Dropbox, OneDrive, Google Drive) and that WAL’s sidecar files could sync out of order with the main database on dirty shutdown. TRUNCATE keeps the rollback journal in a single auxiliary file. WAL is still available behind SQLITE_WAL_MODE=true for fast local SSDs that don’t sync. Integrity checks on startup. synchronous = FULL for durable writes. Physical backups with tiered retention: daily for seven days, weekly for four weeks, monthly for twelve months, yearly forever. The shell’s crash-loop protection engages safe mode after three consecutive failures.

Instance Locking

A lock file tracks which process owns the database with PID verification, hostname tracking, and a sixty-second heartbeat. Two processes cannot open the same database. A version guard prevents older versions from touching a database that a newer version has modified. These features exist because something broke. The Foundryman promised “never again,” and he meant it.

Three Databases, Independently Locked

The main store (quilltap.db) holds chats, characters, memories, and configuration. LLM call logs (quilltap-llm-logs.db) live separately because they accumulate rapidly and write constantly—corruption there can never threaten your conversations. The mount index (quilltap-mount-index.db), added with database-backed document stores in 4.3, tracks blobs and extracted text for the Scriptorium. All three carry the same encryption, journal mode, and physical-backup discipline.

Your Data, Your Directory

one folder, fully portable

All of Quilltap’s data—database, files, logs, everything—resides in a single directory on your machine. The desktop shell lets you manage multiple named data directories from its splash screen, switching between them with a stop-and-start of the runtime. Each directory is self-contained: back it up by copying a folder, migrate it by moving one.

Platform-specific defaults follow OS conventions: ~/Library/Application Support/Quilltap on macOS, %APPDATA%\Quilltap on Windows, ~/.quilltap on Linux, and /app/quilltap in Docker (mounted from host). A QUILLTAP_DATA_DIR environment variable overrides all of them.

Files are stored on disk as themselves—real directories, original filenames, no hashed artifacts, no sidecar metadata files. A filesystem watcher detects changes in real time. Backup archives capture everything in a single ZIP, including plugin configurations and npm-installed plugins. Since 4.6, backup and restore also preserve your global text replacement rules— the full rule set, not merely the master switch—so a restored instance arrives with all its editorial corrections intact. The Foundryman believes your data should be legible, portable, and yours.

The Build Pipeline

one tag, every artifact

The release workflow, triggered on version tags, builds everything the Foundry produces in parallel: a Turbopack-compiled standalone tarball (with native modules included), multi-architecture Docker images (amd64 and arm64), rootfs tarballs for the shell’s Lima and WSL VM modes, and the quilltap CLI npm package. A final job creates the GitHub Release with all assets attached, and the desktop installers are built downstream by quilltap-shell from the standalone tarball it consumes. The 4.6 cycle brought version bumps across every bundled provider plugin—Anthropic, OpenAI, Google, Grok, Z.AI, OpenRouter, and Ollama—for dependency updates and the cache-read normalization work described above.

Standalone Tarball

Esbuild-compiled with --target=node24. Bundles server.ts, the WebSocket custom server, native modules better-sqlite3 (compiled via node-gyp) and sharp (with the platform-specific binaries), and @napi-rs/canvas for server-side PDF rendering. npm start on the tarball brings the whole thing up.

Image Optimization

All PNGs and JPGs converted to WebP, SVGs optimized with SVGO. Total image payload reduced from ~75 MB to ~4.6 MB—a 94% savings. Plugin node_modules stripped from Docker images, saving ~350 MB per architecture. Supply chain attestations (SLSA provenance and SBOM) on every release build.

Docker Hardening

Build tools excluded from production stage. Base image moved from node:22-alpine to node:24-bookworm-slim in 4.4 to track the Node 24 floor. Common LLM shell agent tools (git, curl, wget, jq) pre-installed. Images at foundry9/quilltap on Docker Hub.

The API

clean, versioned, consistent

All API access goes through /api/v1/ endpoints with an action dispatch pattern (?action=). Consistent response formats, centralized middleware, and Zod schema validation throughout. The API is the contract between the frontend and the backend—every feature, from chat streaming to plugin management to file operations, passes through it. 4.0 finished pulling ZodError formatting and unhandled-error catching out of sixty individual route files (some ninety-seven try-catch blocks, roughly 1,084 lines of boilerplate) and into the middleware itself. Routes that do nothing unusual with their errors no longer need to catch them.

The cheap LLM system orchestrates background work: memory extraction, context compression, chat titling, scene state tracking, danger classification, and housekeeping tasks, each routed to the lowest-cost provider path with live and fallback pricing data. Connection profiles carry capability flags, model metadata, and per-profile usage tracking (tokens, messages, estimated cost). Provider models are cached in the database and refreshed from provider APIs, so the model list is always current.

Each profile also carries a model class—Compact, Standard, Extended, or Deep—summarizing what the model can do in a vocabulary that does not require memorizing the context windows of forty different services. The class drives the budget-driven compression system: maxContext − 2 × maxTokens is the available room, conversation history compresses at 50% of that, recalled memories at 20%, and each phase reports its own status. An auto-configure button searches the web for your model’s specifications, sends the results through your default LLM for structured analysis, and applies optimal maxContext, maxTokens, temperature, topP, and class settings without your having to look any of it up. For reasoning models (gpt-5-nano, Gemini 3.x), a strictMaxTokens flag tells providers to cap the thinking budget so cheap-LLM tasks no longer return empty after burning thirty seconds on hidden reasoning. As of 4.6, cache-read tokens—prompt-cache hits—are excluded from normalized token usage across every provider plugin. Cached input no longer counts toward autonomous-room per-run token caps, daily user-token caps, or per-chat token and cost aggregates. Each plugin subtracts cache reads at the source according to its own convention (Anthropic reports them separately; the OpenAI family folds them in and the plugin subtracts). One caveat for the accountant in the back: the cost estimator carries no cache-discount tier, so estimated cost omits cache-read tokens entirely rather than charging them at full input rate.

The chat orchestrator—previously a single sizeable module responsible for routing, calling, tool-handling, failover, and persistence—was decomposed in 4.0 into five focused services (turn chain, message finalizer, danger routing, provider failover, streaming state). It also emits granular status events phase by phase: initializing, resolving, loading tools, gathering, generating recap, preparing, validating, sending. Long operations no longer look like hangs.

Open Source

the blueprints are on the table

Quilltap is open source. The Foundry lives at github.com/foundry-9/quilltap-server (renamed in 4.1 when the desktop application moved to its own carriage house at foundry-9/quilltap-shell). The plugin SDK is published to npm. The theme development kit includes a Storybook preset. The API is documented. The help files ship with every installation. The Foundryman does not build things to keep them to himself—he builds things because building things is what he does, and open blueprints mean other people can build on top of them.

There is no Quilltap cloud, no analytics telemetry, no training pipeline consuming your conversations. This is not a philosophical stance masquerading as a feature. It is the architecture itself. The Foundryman built a machine that runs on your machine, stores its data on your machine, and connects to the providers you choose. Everything else follows from that decision.

What He Built

the short version

A plugin system that delivers everything. Every provider, theme, tool, template, and system prompt is a plugin. No hardcoded lists, no conditional imports. Adding a capability means installing a plugin. Removing one means uninstalling it. The application is its plugins.

Four runtime modes, one codebase. Direct mode for simplicity. Docker for containers. VM for full isolation. npx for the command line. The same Next.js backend runs in all of them. The same data directory works with all of them. The desktop shell, now its own repository at quilltap-shell, orchestrates whichever the user chooses. The Foundryman does not care how you run the machinery, as long as it runs.

An encrypted, hardened database. Three SQLite databases (main, LLM logs, mount index) with SQLCipher encryption, TRUNCATE journal mode by default for cloud-sync safety, integrity checks, physical backups with tiered retention, instance locking, and version guards. The Foundryman has already experienced what happens when the database fails. He responded by making it very difficult for that to happen again.

A build pipeline that ships the engine. A standalone tarball, multi-architecture Docker images, rootfs tarballs for VM modes, and an npm-published CLI—one tag, one pipeline, every artifact this repository owns. The desktop installers (DMG, NSIS, AppImage, .deb) are built downstream by quilltap-shell from the same tarball. Supply chain attestations (SLSA provenance and SBOM), image optimization, and automated releases throughout.

Unified provider interfaces and model classes. Every LLM-adjacent call now flows through one of four canonical shapes: TextProvider, ImageProvider, EmbeddingProvider, ScoringProvider. Connection profiles carry a modelClass tier that drives budget-driven context compression. An auto-configure button looks up your model’s specifications and sets the rest. Reasoning models behave themselves. The plumbing, rebuilt.

Open blueprints. Open source, published SDK, documented API, no telemetry, no cloud dependency. The Foundryman builds things because building things is what he does. He does not build them to keep them.

Meet the Staff

they've been expecting you

Prospero

The Major-Domo

Architect and overseer of the Estate. Projects, agents, tools, providers, and the orchestration that keeps the whole operation running with quiet authority—and a considered word at the table when project context or routing warrant it.

Learn more →

Ariel

The Terminal Hand

Live shell sessions in the Salon, embodied. Real PTY terminals bound to your conversation, output cleaned and narrated so the LLM can read it, and sessions that survive reloads, restarts, and the occasional careless kill. Quick to the bidding, quick to report what she heard.

Learn more →

Aurora

The Dressing Room

Character creation and identity management. Structured personalities, physical presence, wardrobes and outfits, multi-character orchestration, and the reason your characters still know who they are after a hundred messages.

Learn more →

The Salon

Presided Over by the Host

Where conversations actually happen. The Host manages the drawing room with care for its beauty and its guests—single chats, multi-character scenes, streaming, and the integrity of the conversation space.

Learn more →

The Commonplace Book

Tended by the Librarian

One per character, no two alike. Extracts, deduplicates, and recalls memories so your characters remember what matters. Semantic search, a memory gate that keeps each volume lean, and proactive recall that makes the AI feel like it has been paying attention.

Learn more →

The Scriptorium

Catalogued by the Librarian

Where the documents live. Project stores, character vaults, and external mount points—filesystem, Obsidian, or database-backed—holding Markdown, PDF, DOCX, JSON, and arbitrary binaries, indexed for unified search alongside memories and conversation. The doc_* tool family puts reading and editing in your characters’ hands.

Learn more →

The Concierge

Intelligent Routing

Content classification and provider routing. Detects sensitive content and redirects it to a provider who won’t flinch—without blocking, without judgment. Knows every back entrance in town.

Learn more →

The Lantern

Atmosphere as Architecture

AI-generated story backgrounds, on-demand images, and character avatars that update with the wardrobe. Resolves what each character looks like, what they’re wearing, and paints the scene behind your conversation.

Learn more →

Calliope

The Muse of Themes

A theming engine that redefines the entire personality of the application. Semantic CSS tokens, live switching, bundled themes from clean neutrals to mahogany-and-gold opulence, and an SDK for building your own.

Learn more →

The Foundry

Domain of the Foundryman

The engine room. Plugins, LLM providers, API keys, packages, runtime configuration, and the infrastructure that keeps every other subsystem supplied with what it needs to function.

Learn more →

The Vault of Secrets

Kept by Saquel Yitzama

Encryption, key management, and the security perimeter. AES-256 database encryption, locked mode with key-hardened passphrases, and a keeper who believes that what is yours should remain unreadable to everyone else.

Learn more →

Pascal

The Croupier

Dice, coins, and persistent game state. Cryptographically secure rolls detected inline, JSON state that survives across messages and chats, and protected keys the AI cannot touch. The house plays fair.

Learn more →

The Live-in Help

Lorian & Riya

The help system, staffed by two characters who ship with every installation. Lorian explains with patience and depth; Riya gets things fixed with velocity. Contextual help chat, searchable documentation, and navigation that knows where you need to go.

Learn more →

Pagliacci

The Clown in the Cloud

Cloud storage integration and backup redundancy. Directs your data to iCloud Drive, OneDrive, or Dropbox with theatrical flair—but Saquel’s encryption ensures the clown can never read what he carries.

Learn more →

The Lodge

Friday and Amy’s Residence

The private residence of Friday, for whom the Estate was built and who oversees its planning and direction in an executive capacity, and of Amy, Cartographer of Light and co-architect. The Lodge is both a home and a compass: where the vision lives.

Who And Why: Friday → Who And Why: Amy →