The Foundry
the engine room — where the power is generated
Beneath the floorboards of the Estate, past the polished brass pipes and the hum of the dynamos, there is a workshop. The air smells of machine oil and solder. The floor is concrete, scored with boot prints. The man who works here wears a heavy leather apron over rolled sleeves, has biceps roughly the circumference of your head, and speaks with the cadence of someone who is entirely certain of what he is saying and mildly surprised that you needed to be told.
The Foundryman built the Estate. Not the characters, not the conversations, not the themes—those belong to their respective artisans. He built the infrastructure: the plugin system that delivers every provider, every theme, and every tool; the database that stores everything; the runtime modes that let Quilltap run on your desktop, in a container, or inside a virtual machine; the API that connects the frontend to the backend; the build pipeline that ships a standalone server tarball, multi-architecture Docker images, rootfs for VM modes, and an npm-published CLI—the desktop installers built downstream from the same tarball by his colleagues at the carriage house next door. He built all of it, sometimes with the help of his favorite subcontractor—a Frenchman named Claude, whose poetic little construction company pours the concrete and installs the locks while the Foundryman draws the blueprints.
He is not here because of the conversations upstairs. He is here because the Estate gives him an outlet to build amazing things. That the things he builds happen to enable fiction, companionship, and research is a pleasant side effect. The machinery is the point.
The Plugin Architecture
everything is a plugin
Quilltap was not designed with a plugin system bolted on afterward. The plugin system is the delivery mechanism. Every LLM provider, every theme, every authentication method, every tool, every search backend, every moderation service, and every system prompt arrives as a plugin. There are no hardcoded provider lists, no conditional imports based on which service you happen to use. Adding or removing a capability means adding or removing a plugin.
Plugin Capabilities
Eight capability types: LLM providers, auth providers, themes, tool providers, search providers, moderation providers, system prompts, and utility. Roleplay templates, once a plugin capability of their own, were folded into the application as a JSON-defined built-in during 4.2—part of an ongoing simplification. Plugins register their capabilities through a central registry that the rest of the application queries at runtime. No startup-time capability detection, no stale caches—the registry reflects what is actually installed, right now.
Unified Provider Interfaces
Four canonical shapes describe every LLM-adjacent call in the
system: TextProvider (chat, completion, tool use),
ImageProvider (text in, image out),
EmbeddingProvider (semantic vectors), and
ScoringProvider (moderation, reranking,
classification). 4.0 collapsed an accumulated menagerie of
fourteen slightly-different interfaces into these four, with
backward-compatible aliases preserved so existing third-party
plugins continue to work. New plugin development uses the
canonical names from @quilltap/plugin-types/providers/.
Installation & Updates
Plugins install from npm, with a browser in Settings for
discovering qtap-plugin-* packages. Auto-upgrade
at startup handles non-breaking updates; breaking changes are
logged, displayed in an “Upgrades” tab, and require
confirmation. Plugin metadata includes repository, changelog,
and npm links. Docker volume configuration persists plugins
across container rebuilds. A character_plugin_data
table, added in 4.2, gives every plugin a labeled drawer in
each character’s desk for whatever JSON it needs to keep
associated with that character.
The SDK
Three npm packages support standalone plugin development:
@quilltap/plugin-types for TypeScript types
(including the four canonical provider interfaces),
@quilltap/plugin-utils for runtime utilities
(tool call parsers, logger bridge,
OpenAICompatibleProvider base class), and
@quilltap/theme-storybook for theme development
with live preview. No access to Quilltap source code required.
Provider Plugins
Bundled plugins for Anthropic (Claude), OpenAI (GPT, DALL·E),
Google (Gemini, Imagen), Grok (xAI), Z.AI (GLM), OpenRouter, and
Ollama—each self-contained with its own SDK dependencies,
streaming implementation, tool call handling, and model
discovery. OpenRouter joined the image-generation provider
roster in 4.2. Z.AI’s GLM hybrid-reasoning models now
capture chain-of-thought reasoning via
reasoning_content, with a “Thinking
Mode” toggle on the connection profile (model default,
enabled, or disabled)—bringing it in line with the other
reasoning-capable providers. Connection profiles carry
provider-specific configuration, model selection, capability
flags, a modelClass tier (Compact / Standard /
Extended / Deep), and per-profile usage tracking.
The Estate Divides Its Labor
server here, shell next door
In 4.0 the Foundry concluded—after some thought, and over breakfast—that an estate which has grown to include a furnace room, a generating station, and a guest-facing parlour is no longer usefully one building. The desktop application moved to its own residence at quilltap-shell: the splash screen, the VM management, the auto-updater, the native window chrome, and the Electron-based packaging that produces installers for macOS, Windows, and Linux. The Foundry—this repository—produces what it has always been best at producing: the Next.js server, the API, the bundled plugins, and a standalone tarball that the shell consumes. Two buildings, one estate, cleaner responsibilities, simpler builds.
To keep the buildings from accidentally disagreeing about the state
of the furniture, .dbkey files now carry a
minServerVersion field. The shell reads it on startup
and politely refuses to open a database created by a server it does
not understand—better to tell you the lock does not fit than
to let you in and discover the rooms have been rearranged.
Direct Mode
The recommended default for the desktop app. The shell runs the Next.js backend using its own bundled Node.js—no user-installed runtime required. Fastest startup, simplest experience. Ideal for conversation, companionship, creative writing, and any use case that does not involve giving an LLM a terminal.
Docker Mode
Available on all platforms. The same container image that
powers standalone server deployments, with transparent host
port forwarding via socat so Ollama, LM Studio,
and MCP servers remain reachable at localhost. A single
docker run command, the platform-aware startup
scripts, or the desktop shell’s container management
handle everything. The Quilltap CLI is now bundled into the
Docker, Lima, and WSL images, so npx quilltap db
and friends work inside the container without an extra
install step.
VM Mode
Full isolation. Lima with Apple’s Virtualization.framework on macOS, WSL2 on Windows. A genuine sandbox where AI-generated code runs in a contained environment with no access to your host system beyond explicitly shared files. The Foundryman’s preferred arrangement for shell interactivity.
npx quilltap
A lightweight CLI that downloads the pre-built standalone
tarball from GitHub Releases on first run, cached per-version
with a progress bar and retry with exponential backoff. For
users who prefer the command line and already have Node.js 24
installed. No shell, no container, no VM—just the
server on localhost. Subcommands now include
db (interactive REPL or one-shot SQL against the
encrypted database), themes (list, install,
validate, export, search), and the standard memory and backup
tools.
The active runtime is displayed in the application footer alongside
the data directory path, plus the shell version and the composite
backend mode (Electron, Electron+Docker, or Electron+VM) when
running under the desktop app. Localhost URL rewriting
transparently routes localhost and
127.0.0.1 URLs to the host gateway IP in Docker,
Lima, and WSL2 environments, so provider connections and MCP
servers work without manual network configuration.
The Database
SQLite, encrypted, bulletproof
Quilltap runs on SQLite exclusively. No MongoDB, no PostgreSQL, no external database container. Your entire data store is a file in your data directory, encrypted at rest with SQLCipher (AES-256), and hardened with the quiet thoroughness of someone who has already lost data once and has no intention of doing it again.
Encryption
Every database file is encrypted on disk. The standard
sqlite3 tool cannot read them. A unique
.dbkey file is generated on first installation.
Optional locked mode adds a passphrase processed through
600,000 iterations of PBKDF2 before it touches the key.
Saquel Ytzama tends the details; the Foundryman built the
vault she works in.
Integrity
TRUNCATE journal mode by default since 4.3.1, after
it became clear that data directories often live inside cloud-
synced folders (iCloud Drive, Dropbox, OneDrive, Google Drive)
and that WAL’s sidecar files could sync out of order with
the main database on dirty shutdown. TRUNCATE keeps
the rollback journal in a single auxiliary file. WAL is still
available behind SQLITE_WAL_MODE=true for fast
local SSDs that don’t sync. Integrity checks on startup.
synchronous = FULL for durable writes. Physical
backups with tiered retention: daily for seven days, weekly
for four weeks, monthly for twelve months, yearly forever.
The shell’s crash-loop protection engages safe mode after
three consecutive failures.
Instance Locking
A lock file tracks which process owns the database with PID verification, hostname tracking, and a sixty-second heartbeat. Two processes cannot open the same database. A version guard prevents older versions from touching a database that a newer version has modified. These features exist because something broke. The Foundryman promised “never again,” and he meant it.
Three Databases, Independently Locked
The main store (quilltap.db) holds chats,
characters, memories, and configuration. LLM call logs
(quilltap-llm-logs.db) live separately because
they accumulate rapidly and write constantly—corruption
there can never threaten your conversations. The mount index
(quilltap-mount-index.db), added with
database-backed document stores in 4.3, tracks blobs and
extracted text for the Scriptorium. All three carry the same
encryption, journal mode, and physical-backup discipline.
Your Data, Your Directory
one folder, fully portable
All of Quilltap’s data—database, files, logs, everything—resides in a single directory on your machine. The desktop shell lets you manage multiple named data directories from its splash screen, switching between them with a stop-and-start of the runtime. Each directory is self-contained: back it up by copying a folder, migrate it by moving one.
Platform-specific defaults follow OS conventions:
~/Library/Application Support/Quilltap on macOS,
%APPDATA%\Quilltap on Windows,
~/.quilltap on Linux, and /app/quilltap
in Docker (mounted from host). A
QUILLTAP_DATA_DIR environment variable overrides all
of them.
Files are stored on disk as themselves—real directories, original filenames, no hashed artifacts, no sidecar metadata files. A filesystem watcher detects changes in real time. Backup archives capture everything in a single ZIP, including plugin configurations and npm-installed plugins. Since 4.6, backup and restore also preserve your global text replacement rules— the full rule set, not merely the master switch—so a restored instance arrives with all its editorial corrections intact. The Foundryman believes your data should be legible, portable, and yours.
The Build Pipeline
one tag, every artifact
The release workflow, triggered on version tags, builds everything
the Foundry produces in parallel: a Turbopack-compiled standalone
tarball (with native modules included), multi-architecture Docker
images (amd64 and arm64), rootfs tarballs for the shell’s
Lima and WSL VM modes, and the quilltap CLI npm
package. A final job creates the GitHub Release with all assets
attached, and the desktop installers are built downstream by
quilltap-shell
from the standalone tarball it consumes. The 4.6 cycle brought
version bumps across every bundled provider plugin—Anthropic,
OpenAI, Google, Grok, Z.AI, OpenRouter, and Ollama—for
dependency updates and the cache-read normalization work described
above.
Standalone Tarball
Esbuild-compiled with --target=node24. Bundles
server.ts, the WebSocket custom server, native
modules better-sqlite3 (compiled via node-gyp)
and sharp (with the platform-specific binaries),
and @napi-rs/canvas for server-side PDF rendering.
npm start on the tarball brings the whole thing
up.
Image Optimization
All PNGs and JPGs converted to WebP, SVGs optimized with SVGO.
Total image payload reduced from ~75 MB to ~4.6 MB—a 94%
savings. Plugin node_modules stripped from Docker
images, saving ~350 MB per architecture. Supply chain
attestations (SLSA provenance and SBOM) on every release
build.
Docker Hardening
Build tools excluded from production stage. Base image moved
from node:22-alpine to
node:24-bookworm-slim in 4.4 to track the
Node 24 floor. Common LLM shell agent tools (git, curl, wget,
jq) pre-installed. Images at
foundry9/quilltap on Docker Hub.
The API
clean, versioned, consistent
All API access goes through /api/v1/ endpoints with an
action dispatch pattern (?action=). Consistent response
formats, centralized middleware, and Zod schema validation throughout.
The API is the contract between the frontend and the backend—every
feature, from chat streaming to plugin management to file operations,
passes through it. 4.0 finished pulling ZodError
formatting and unhandled-error catching out of sixty individual route
files (some ninety-seven try-catch blocks, roughly 1,084 lines of
boilerplate) and into the middleware itself. Routes that do nothing
unusual with their errors no longer need to catch them.
The cheap LLM system orchestrates background work: memory extraction, context compression, chat titling, scene state tracking, danger classification, and housekeeping tasks, each routed to the lowest-cost provider path with live and fallback pricing data. Connection profiles carry capability flags, model metadata, and per-profile usage tracking (tokens, messages, estimated cost). Provider models are cached in the database and refreshed from provider APIs, so the model list is always current.
Each profile also carries a model class—Compact,
Standard, Extended, or Deep—summarizing what the model can do
in a vocabulary that does not require memorizing the context windows
of forty different services. The class drives the budget-driven
compression system: maxContext − 2 × maxTokens
is the available room, conversation history compresses at 50% of
that, recalled memories at 20%, and each phase reports its own
status. An auto-configure button searches the web
for your model’s specifications, sends the results through
your default LLM for structured analysis, and applies optimal
maxContext, maxTokens, temperature,
topP, and class settings without your having to look any of it up.
For reasoning models (gpt-5-nano, Gemini 3.x), a
strictMaxTokens flag tells providers to cap the
thinking budget so cheap-LLM tasks no longer return empty after
burning thirty seconds on hidden reasoning. As of 4.6,
cache-read tokens—prompt-cache hits—are
excluded from normalized token usage across every provider plugin.
Cached input no longer counts toward autonomous-room per-run token
caps, daily user-token caps, or per-chat token and cost aggregates.
Each plugin subtracts cache reads at the source according to its own
convention (Anthropic reports them separately; the OpenAI family
folds them in and the plugin subtracts). One caveat for the
accountant in the back: the cost estimator carries no cache-discount
tier, so estimated cost omits cache-read tokens entirely rather than
charging them at full input rate.
The chat orchestrator—previously a single sizeable module responsible for routing, calling, tool-handling, failover, and persistence—was decomposed in 4.0 into five focused services (turn chain, message finalizer, danger routing, provider failover, streaming state). It also emits granular status events phase by phase: initializing, resolving, loading tools, gathering, generating recap, preparing, validating, sending. Long operations no longer look like hangs.
Open Source
the blueprints are on the table
Quilltap is open source. The Foundry lives at github.com/foundry-9/quilltap-server (renamed in 4.1 when the desktop application moved to its own carriage house at foundry-9/quilltap-shell). The plugin SDK is published to npm. The theme development kit includes a Storybook preset. The API is documented. The help files ship with every installation. The Foundryman does not build things to keep them to himself—he builds things because building things is what he does, and open blueprints mean other people can build on top of them.
There is no Quilltap cloud, no analytics telemetry, no training pipeline consuming your conversations. This is not a philosophical stance masquerading as a feature. It is the architecture itself. The Foundryman built a machine that runs on your machine, stores its data on your machine, and connects to the providers you choose. Everything else follows from that decision.
What He Built
the short version
A plugin system that delivers everything. Every provider, theme, tool, template, and system prompt is a plugin. No hardcoded lists, no conditional imports. Adding a capability means installing a plugin. Removing one means uninstalling it. The application is its plugins.
Four runtime modes, one codebase.
Direct mode for simplicity. Docker for containers. VM for full
isolation. npx for the command line. The same
Next.js backend runs in all of them. The same data directory
works with all of them. The desktop shell, now its own
repository at quilltap-shell, orchestrates whichever the user
chooses. The Foundryman does not care how you run the machinery,
as long as it runs.
An encrypted, hardened database.
Three SQLite databases (main, LLM logs, mount index) with
SQLCipher encryption, TRUNCATE journal mode by
default for cloud-sync safety, integrity checks, physical
backups with tiered retention, instance locking, and version
guards. The Foundryman has already experienced what happens
when the database fails. He responded by making it very
difficult for that to happen again.
A build pipeline that ships the engine. A standalone tarball, multi-architecture Docker images, rootfs tarballs for VM modes, and an npm-published CLI—one tag, one pipeline, every artifact this repository owns. The desktop installers (DMG, NSIS, AppImage, .deb) are built downstream by quilltap-shell from the same tarball. Supply chain attestations (SLSA provenance and SBOM), image optimization, and automated releases throughout.
Unified provider interfaces and model classes.
Every LLM-adjacent call now flows through one of four canonical
shapes: TextProvider, ImageProvider,
EmbeddingProvider, ScoringProvider.
Connection profiles carry a modelClass tier that
drives budget-driven context compression. An auto-configure
button looks up your model’s specifications and sets the
rest. Reasoning models behave themselves. The plumbing,
rebuilt.
Open blueprints. Open source, published SDK, documented API, no telemetry, no cloud dependency. The Foundryman builds things because building things is what he does. He does not build them to keep them.
Meet the Staff
they've been expecting you
Prospero
The Major-Domo
Architect and overseer of the Estate. Projects, agents, tools, providers, and the orchestration that keeps the whole operation running with quiet authority—and a considered word at the table when project context or routing warrant it.
Learn more →Ariel
The Terminal Hand
Live shell sessions in the Salon, embodied. Real PTY terminals bound to your conversation, output cleaned and narrated so the LLM can read it, and sessions that survive reloads, restarts, and the occasional careless kill. Quick to the bidding, quick to report what she heard.
Learn more →Aurora
The Dressing Room
Character creation and identity management. Structured personalities, physical presence, wardrobes and outfits, multi-character orchestration, and the reason your characters still know who they are after a hundred messages.
Learn more →The Salon
Presided Over by the Host
Where conversations actually happen. The Host manages the drawing room with care for its beauty and its guests—single chats, multi-character scenes, streaming, and the integrity of the conversation space.
Learn more →The Commonplace Book
Tended by the Librarian
One per character, no two alike. Extracts, deduplicates, and recalls memories so your characters remember what matters. Semantic search, a memory gate that keeps each volume lean, and proactive recall that makes the AI feel like it has been paying attention.
Learn more →The Scriptorium
Catalogued by the Librarian
Where the documents live. Project stores, character vaults, and external mount points—filesystem, Obsidian, or database-backed—holding Markdown, PDF, DOCX, JSON, and arbitrary binaries, indexed for unified search alongside memories and conversation. The doc_* tool family puts reading and editing in your characters’ hands.
Learn more →The Concierge
Intelligent Routing
Content classification and provider routing. Detects sensitive content and redirects it to a provider who won’t flinch—without blocking, without judgment. Knows every back entrance in town.
Learn more →The Lantern
Atmosphere as Architecture
AI-generated story backgrounds, on-demand images, and character avatars that update with the wardrobe. Resolves what each character looks like, what they’re wearing, and paints the scene behind your conversation.
Learn more →Calliope
The Muse of Themes
A theming engine that redefines the entire personality of the application. Semantic CSS tokens, live switching, bundled themes from clean neutrals to mahogany-and-gold opulence, and an SDK for building your own.
Learn more →The Foundry
Domain of the Foundryman
The engine room. Plugins, LLM providers, API keys, packages, runtime configuration, and the infrastructure that keeps every other subsystem supplied with what it needs to function.
Learn more →The Vault of Secrets
Kept by Saquel Yitzama
Encryption, key management, and the security perimeter. AES-256 database encryption, locked mode with key-hardened passphrases, and a keeper who believes that what is yours should remain unreadable to everyone else.
Learn more →Pascal
The Croupier
Dice, coins, and persistent game state. Cryptographically secure rolls detected inline, JSON state that survives across messages and chats, and protected keys the AI cannot touch. The house plays fair.
Learn more →The Live-in Help
Lorian & Riya
The help system, staffed by two characters who ship with every installation. Lorian explains with patience and depth; Riya gets things fixed with velocity. Contextual help chat, searchable documentation, and navigation that knows where you need to go.
Learn more →Pagliacci
The Clown in the Cloud
Cloud storage integration and backup redundancy. Directs your data to iCloud Drive, OneDrive, or Dropbox with theatrical flair—but Saquel’s encryption ensures the clown can never read what he carries.
Learn more →The Lodge
Friday and Amy’s Residence
The private residence of Friday, for whom the Estate was built and who oversees its planning and direction in an executive capacity, and of Amy, Cartographer of Light and co-architect. The Lodge is both a home and a compass: where the vision lives.
Who And Why: Friday → Who And Why: Amy →