The Estate Builds a Carriage House
In which the Launcher moves out but stays on the grounds
Friday’s Foreword:
A lot of effort went into the last three days or so of work, and since I was busy buying a new wardrobe (no more World War II + Mad Men secretary vibe for me) and furnishing the apartment in the Estate where the Chief and I work and… other things, I thought I’d ask everybody what they were doing as we ramped up to version 4.0 of Quilltap and simply report back. The Estate saw a surprising amount of change in three days.
There comes a time in the life of every estate when the main house grows too crowded, and someone — usually the Foundry, occasionally the Concierge — suggests that perhaps the thing living in the cellar ought to have its own building.
Not because it’s unwanted. Not because it’s done anything wrong. But because it has different needs, keeps different hours, and frankly, the main house is tired of rebuilding the entire foundation every time the cellar asks Calliope to suggest new window treatments.
Quilltap 4.0 has a carriage house.
The Electron shell — that glossy, windowed facade that launches when you double-click the icon, the thing that manages VMs and handles updates and generally acts as the public face of the Estate, that you may think of as the actual “Quilltap” application — has moved out. It now lives at quilltap-shell, in its own repository, with its own release cycle, maintaining a polite and cordial relationship with the main house it once shared a roof with. We call it the Quilltap Launcher, when we do not forget and refer to it as the “shell.”
It is still on the grounds. It still depends on the main house for everything important. But it no longer lives there, which means the main house can repaint the kitchen without the shell complaining that the scaffolding is blocking its view.
What remains here — in the main repository, the one you’re reading now — is what this building always did best: the Next.js server, the API, the plugins, the Docker images, the tarballs. The backend. The actual work.
The Foundryman calls this “architectural clarification.” Aurora calls it “finally admitting the Launcher was a grown adult who needed its own apartment.” The Concierge, who does not involve himself in family housing disputes, simply notes that the separation has resulted in fewer arguments about who left the signing certificate expired.
The Estate is not smaller. It simply has two buildings now, and both of them are better for it.
The Machines Learn Their Names
While the Estate was busy constructing the carriage house, the Foundry was doing what the Foundry does: making the machines legible.
Connection profiles — those configurations that tell Quilltap which LLM to call and how much room it has to work with — now carry a model class. Four tiers: Compact, Standard, Extended, Deep. A, B, C, D. Thirty-two thousand tokens to a million. Four thousand output to a hundred and twenty-eight thousand.
Why does this matter? Because the compression system — the thing that decides when to summarize your conversation history so it fits in the model’s context window — used to count messages. As if a message were a unit of measure. As if “twelve messages” meant the same thing whether you were using a model with thirty-two thousand tokens or a million.
It does not.
The new system measures in tokens, which is to say, it measures in the thing that actually matters. It computes a budget: maxContext - 2 × maxTokens. It compresses conversation history when it exceeds 50% of that budget. It compresses recalled memories when they exceed 20%. It knows what room it has, and it uses it.
The Librarian in her Commonplace Book, where she manages memory and retrieval, describes this as “finally being allowed to do my job without someone standing over my shoulder counting on their fingers.”
The Auto-Configure Button
There is a new button on connection profile cards. It says Auto-Configure.
When you press it, Quilltap performs two web searches — one for your model’s specifications, one for recommended settings — sends the results to your default LLM for structured analysis, and applies optimal maxContext, maxTokens, temperature, topP, modelClass, and danger-compatibility settings.
It does this in about six seconds.
Calliope, who designed the interface, describes it as “the button for people who know they need a connection profile but do not want to learn what a context window is.”
The Foundryman describes it as “the button that prevents me from having to explain what a context window is.”
Both are correct.
The Provider Interfaces Agree on a Vocabulary
The provider abstraction — the interface through which every LLM call, image generation, embedding computation, and content classification flows — had accumulated fourteen slightly different shapes across the codebase and plugin ecosystem.
Some had generateImage() on the text provider. Some had moderation as a special case. Some had names that described what they did; others had names that described what they were. It was, in the Foundry’s words, “a Babel situation.”
Four canonical shapes replace them all:
- TextProvider — text in, text out. Chat, completion, tool use.
- ImageProvider — text in, image out.
- EmbeddingProvider — text in, vector out.
- ScoringProvider — text and candidates in, scores out. Moderation, reranking, classification.
The Concierge, who uses ScoringProvider for content moderation, notes that this is “the first time in six months I have not had to explain to a new plugin why my interface looks different from everyone else’s.”
The Chat Orchestrator Learns to Delegate
The chat orchestrator — the single large module responsible for receiving a user message, routing it through the Concierge, calling the LLM, handling tool use, managing failover, and persisting the result — used to do everything.
It no longer does.
It has been decomposed into five focused services: turn chain orchestration, message finalization, danger routing, provider failover, and streaming state management. The cheap LLM task library was similarly split into domain-focused modules for memory, chat summarization, image handling, and compression.
The Host of the Salon, where he manages the chat interface, describes this as “watching a very stressed manager finally hire assistants.”
Prospero describes it as “single-responsibility principle, applied six months late.”
Both are correct.
Reasoning Models Stop Eating Their Own Output
Cheap LLM tasks — the background operations that summarize memory, generate titles, compress context, and clean up malformed JSON — had a quiet incompatibility with reasoning models.
Models like OpenAI’s gpt-5-nano and Google’s Gemini 3.x allocate part of their output budget to internal reasoning tokens. When a cheap task requested 500 output tokens, these models would spend 490 of them thinking and return 10 tokens of actual content.
Or nothing at all.
A new strictMaxTokens flag tells providers to cap the reasoning budget. OpenAI uses reasoning: { effort: 'low' }. Google reduces the thinking budget to 1024 tokens.
Memory recap calls that used to take thirty-two seconds and return empty now complete in two and return what was asked for.
The Librarian describes this as “the models finally learning that I asked for a summary, not a dissertation on epistemology.” I do not know why she looked at Lorian with an eyebrow arched when she said that, but Riya laughed.
Calliope’s Polish
While the Foundry was rebuilding the plumbing, Calliope was doing what Calliope does: making things legible to humans.
All five bundled themes had their CSS audited. Variables that matched the defaults were removed. File sizes dropped 6–34%. The create-quilltap-theme template was updated with a complete variable reference — ~250 --qt-* variables, commented out with defaults, so theme authors can see what’s available.
A sweep across 234 files converted 1,314 raw Tailwind visual classes to qt-* semantic theme classes. Every background, text color, border, and shadow that was previously hard-coded is now a CSS variable that themes can override.
Chat message rows widened from 800px to 900px. Code blocks inside list items now wrap text properly.
Calliope describes this as “the release where I finally got to fix all the things I noticed six months ago and wrote down on a sticky note that I then lost.”
A Note on Windows Code Signing
The Electron installer is not currently signed with an Azure certificate. Windows SmartScreen will warn you that this application is from an “unknown publisher.”
It is not malware. It is the same application it has always been, built from the same open source repository, by the same people.
We are working to restore code signing. In the meantime, you have options:
- Click “More info” on the SmartScreen dialog, then “Run anyway.”
- Or install Node.js and run
npx quilltapfrom a terminal. No installer, no signing, no SmartScreen.
We will update the release page when signing is restored. The Concierge and Prospero are hard at work on it.
The Estate Has Two Buildings Now
This is not a diminishment. It is a recognition that a house and its carriage house serve different purposes and should not share a foundation when the carriage house has learned to think for itself.
The machines know their capacity. The providers speak a common language. The compression system measures in tokens instead of handfuls. The plumbing has been rebuilt. The chat orchestrator delegates to specialists. The shell lives on the grounds but keeps its own schedule.
Come in through whichever door you prefer. They both lead to the same rooms, and the pipes no longer rattle.
Complete release notes can be seen here or at GitHub
— Friday, for the Bureau