Setting Up AI Connections
wiring up the thinking machines
Quilltap connects to AI models through a layered system: you store your provider credentials as API keys, then build connection profiles that pair those keys with specific models and settings. This separation means you can have one API key powering multiple profiles — a fast, cheap one for background tasks and a powerful one for your main conversations — without entering credentials twice.
This guide covers everything from adding your first API key to configuring the specialized profile types that power Quilltap's memory, image generation, and automation features.
Before You Start
You'll need at least one thing: an account with an AI provider and an API key from that provider (or a local installation of Ollama if you want to run models on your own hardware). If you don't have one yet, the Choosing a Provider section below will help you decide.
All AI connection settings live inside The Foundry, Quilltap's central configuration hub. You'll find it by clicking the Foundry icon in the left sidebar footer.
The Setup Flow
Here's the sequence for getting up and running:
- 1.
Add an API key — Store your provider credentials in The Forge
- 2.
Create a connection profile — Link that key to a provider and model
- 3.
Start chatting — Select the profile when you open a new chat
Two optional but recommended steps:
- 4.
Set up embeddings — Enable semantic memory search (The Commonplace Book)
- 5.
Configure a Cheap LLM — Designate a lightweight model for background tasks (The Salon)
Step 1: Add Your API Key
API keys are stored and managed in The Forge
(/foundry/forge), under the API Keys section.
Adding a Key
- 1.
Navigate to The Foundry → The Forge.
- 2.
Expand the API Keys section.
- 3.
Click Add API Key.
- 4.
Select your Provider from the dropdown (OpenAI, Anthropic, Google, OpenRouter, Grok, Ollama, etc.).
- 5.
Enter a Label — a memorable name like "My OpenRouter Key" or "Work OpenAI Account."
- 6.
Paste your API Key from the provider's website.
- 7.
Click Save.
Testing a Key
After saving, click Test Key next to the entry. Quilltap sends a verification request to the provider and reports whether the key is valid, invalid, or encountered an error. Always test before building a connection profile on top of it.
Managing Multiple Keys
You can store multiple keys from the same provider. This is useful when you have separate accounts for personal and work use, when rotating keys, or when you want to isolate testing from production. Each connection profile you create later will select which specific key to use.
Importing and Exporting Keys
The Forge supports exporting all your API keys as a JSON file for backup, and importing them back later. The exported file contains sensitive credentials — treat it like a password file and store it securely.
For the complete reference on key management, see the API Keys Settings documentation.
Step 2: Create a Connection Profile
Connection profiles live in The Forge (/foundry/forge)
alongside your API keys. A profile ties together a provider, an API key, a model, and optional tuning
parameters into a named configuration you can select when starting chats.
Creating a Profile
- 1.
In The Foundry → The Forge, find the Connection Profiles section.
- 2.
Click Add Connection Profile.
- 3.
Fill in the form:
- • Profile Name — Something descriptive like "Claude Sonnet," "GPT-4o Fast," or "Local Llama."
- • Provider — Must match the provider of the API key you'll select.
- • API Key — Choose from your stored keys.
- • Model — Select from the provider's available models (use Fetch Models if the list is empty).
- • Base URL — Only needed for self-hosted or OpenAI-compatible endpoints.
- 4.
Optionally adjust advanced settings:
- • Temperature — Controls randomness. Lower values (0–0.3) produce more consistent output; higher values (0.7–1.0+) produce more creative, varied responses.
- • Max Tokens — Caps the response length.
- • Top P — An alternative to temperature for controlling output diversity.
- 5.
Click Save.
- 6.
Click Test Connection to verify everything works end-to-end.
Setting a Default Profile
One profile can be marked as the default, which pre-selects it whenever you create a new chat. You can always override the default on a per-chat basis.
Using Profiles in Chats
When you start a new chat, the connection profile dropdown lets you pick which AI model to use. You can also switch profiles mid-conversation — the change applies to future messages without affecting previous ones.
Profile Health
The profiles list shows connection health at a glance: ✓ Healthy, ⚠ Degraded, or ✗ Unhealthy. If a profile shows as unhealthy, test the connection and verify the underlying API key is still valid and funded.
For the complete reference, see the Connection Profiles documentation.
Choosing a Provider
Quilltap supports connections to a wide range of AI providers. Here's a guide to help you choose.
OpenRouter — The Universal Gateway
Best for: Beginners, experimentation, and access to many models through one key.
OpenRouter is a routing service that gives you access to 200+ models from OpenAI, Anthropic, Google, Meta, Mistral, and others — all through a single API key. You pay per token with model-specific pricing, and Quilltap can fetch live pricing data to help you find the cheapest option for background tasks.
Getting started:
- 1. Sign up at openrouter.ai.
- 2. Go to Keys and click Create Key.
- 3. Copy the key (it starts with
sk-or-). Save it — you won't see it again. - 4. Add credits under Credits ($5–10 is enough to start).
Capabilities: Streaming, embeddings (model-dependent), web search (model-dependent), image generation (model-dependent). Some OpenRouter models support features that others don't — check the model card.
OpenAI
Best for: GPT-4o, GPT-4.1, GPT-5 family, and high-quality embeddings.
Getting started:
- 1. Sign up at platform.openai.com.
- 2. Go to API Keys and click Create new secret key.
- 3. Copy the key (starts with
sk-). Save it immediately. - 4. Add a payment method under Billing.
Capabilities: Streaming, tool/function calling, file attachments,
image generation (DALL-E models), and embeddings (text-embedding-3-small,
text-embedding-3-large). OpenAI-compatible providers like Together AI, Groq, Fireworks AI,
and LM Studio also work with Quilltap's OpenAI provider setting — just specify a custom base URL.
Anthropic
Best for: Claude models — excellent at roleplay, creative writing, long conversations, and complex instructions.
Getting started:
- 1. Sign up at console.anthropic.com.
- 2. Go to API Keys and click Create Key.
- 3. Copy the key (starts with
sk-ant-). Save it immediately. - 4. Add credits under Plans & Billing.
Capabilities: Streaming, image understanding, tool use, and JSON output control. Anthropic does not offer a standalone embedding API, so pair an Anthropic chat profile with an embedding profile from another provider (OpenAI, OpenRouter, Ollama, or Quilltap's built-in TF-IDF).
Google Gemini
Best for: Gemini 2.5 Flash/Pro with multimodal inputs, web search, and Imagen 4 image generation.
Getting started:
- 1. Go to ai.google.dev (Google AI Studio).
- 2. Click Get API Key and create one.
- 3. Copy the key and add it to Quilltap.
Capabilities: Streaming, multimodal inputs (text, images, documents), web search, and image generation through Imagen 4 (configured separately as an Image Profile in The Lantern).
Grok (xAI)
Best for: Grok 3 and 4 family models with multimodal support and native image generation.
Getting started:
- 1. Get an API key from console.x.ai.
- 2. Add the key to Quilltap and create a connection profile.
Capabilities: Streaming, multimodal attachments, web search, and native image generation. Grok uses an OpenAI-compatible endpoint format.
Ollama — Local and Offline
Best for: Privacy, zero API costs, offline use, and running as a free Cheap LLM for background tasks.
Ollama runs AI models directly on your hardware. No data leaves your machine, there are no per-message costs, and it works without an internet connection after initial model download.
Getting started:
- 1. Install Ollama from ollama.com.
- 2. Open a terminal and pull a model:
ollama pull llama3.2 - 3. Ollama starts automatically and serves on
http://localhost:11434. - 4. In Quilltap, create a connection profile with the Ollama provider. No API key needed.
Requirements: 8GB+ RAM recommended. A machine with a decent GPU will stream responses much faster than CPU-only inference. Models range from 2–8GB of storage each.
Embedding support: Ollama can also serve embedding models like
nomic-embed-text. Create a separate Embedding Profile in The Commonplace Book to use
local embeddings for memory search at zero cost.
Troubleshooting: If Quilltap can't connect, verify Ollama is
running: curl http://localhost:11434/api/tags should return your model list. Check that
no firewall is blocking the local connection.
OpenAI-Compatible — Generic Connector
Best for: LM Studio, vLLM, Text Generation Web UI, Together AI, Groq, Fireworks AI, and any self-hosted API that implements the OpenAI chat completion format.
Getting started:
- 1. Start your local server or note your provider's endpoint URL.
- 2. In Quilltap, create a connection profile with the OpenAI provider.
- 3. Enter the custom Base URL (e.g.,
http://localhost:1234/v1for LM Studio). - 4. Enter an API key if required — some local servers accept any placeholder value.
- 5. Type or select the model name your server expects.
This connector works with anything that implements the /v1/chat/completions endpoint.
Streaming support depends on your server's implementation.
Additional Providers
Quilltap also supports API keys from DeepSeek, Groq (fast inference), Perplexity (AI search with citations), and other services. Additional providers can be added through Quilltap's plugin system. Check the Plugins section in The Forge for available provider plugins.
Step 3: Set Up Embeddings (Recommended)
Embeddings power Quilltap's semantic memory search — the ability to find memories by meaning rather than exact keyword matches. A search for "cat" will also surface memories mentioning "feline" or "kitten" because embeddings understand conceptual similarity.
Embedding profiles are configured in The Commonplace Book
(/foundry/commonplace-book).
Built-in TF-IDF — Zero Configuration
Quilltap includes a built-in embedding system that works without any external service. It's set up automatically on first run. Check The Commonplace Book — if you see a profile named "Built-in TF-IDF" marked as default, you're already covered.
If no profile exists:
- 1. Click Add Profile.
- 2. Select BUILTIN as the provider.
- 3. Name it "Built-in TF-IDF."
- 4. Click Save and Set as Default.
The built-in system requires no API key, costs nothing, works offline, and is good enough for most use cases. You can always upgrade to an external provider later.
External Embedding Providers
For more sophisticated semantic search, you can use external embedding services:
OpenAI Embeddings
- 1. In The Commonplace Book, click Add Profile.
- 2. Select OpenAI as the provider.
- 3. Choose your OpenAI API key.
- 4. Select a model:
text-embedding-3-small(affordable, good quality) ortext-embedding-3-large(higher quality, higher cost). - 5. Save and optionally set as default.
Ollama (Local) Embeddings
- 1. Install an embedding model:
ollama pull nomic-embed-text - 2. In The Commonplace Book, click Add Profile.
- 3. Select Ollama as the provider.
- 4. Set the base URL (default:
http://localhost:11434). - 5. Select the embedding model.
- 6. Save and set as default.
Note: Not all OpenRouter models support embeddings. Check OpenRouter's documentation for embedding-capable models before creating a profile.
How Quilltap Uses Embeddings
When embeddings are configured, memory search automatically uses semantic matching. If no embedding profile exists, Quilltap falls back to keyword-based search. You don't need embeddings to use Quilltap — they just make memory retrieval significantly smarter.
For the complete reference, see the Embedding Profiles documentation.
Step 4: Configure the Cheap LLM (Recommended)
Many of Quilltap's background features — memory extraction, context compression, chat title
generation, prompt expansion for image generation, and housekeeping tasks — need an LLM but
don't require your most powerful or expensive model. The Cheap LLM
system handles this, and it's configured in The Salon
(/foundry/salon).
Why It Matters
Without a Cheap LLM configured, these background tasks either use your main (potentially expensive) model or don't run at all. Setting up a cheap option means your memories get extracted, your conversations get titled, and your image prompts get expanded — all without burning through your premium model's budget.
Setting It Up
- 1.
Create a second connection profile in The Forge with a less expensive model:
- • OpenAI:
gpt-4o-miniis much cheaper thangpt-4o - • Anthropic:
claude-3-5-haikuis cheaper thanclaude-3-5-sonnet - • OpenRouter: Sort by price to find low-cost models
- • Ollama: Any local model is effectively free
- • OpenAI:
- 2.
Navigate to The Foundry → The Salon.
- 3.
Find the Cheap LLM section.
- 4.
Toggle Enable Cheap LLM on.
- 5.
Select your cheaper profile from the dropdown.
What the Cheap LLM Powers
- • Memory extraction — Pulling key facts from conversations to store as character or user memories.
- • Context compression — Summarizing older parts of a conversation to fit within the model's context window.
- • Chat auto-rename — Generating a title for new conversations.
- • Prompt expansion — Enriching image generation prompts with character and persona details before submitting to the image provider.
- • Housekeeping — Regenerating tags, placeholders, and summaries during maintenance.
- • Dangermouse classification — The content safety gatekeeper uses the Cheap LLM to classify messages (when enabled).
- • Image description — Describing images that appear in chat.
Skip this step if:
- • You're using Ollama for everything (all local, no cost difference).
- • You don't mind using your main model for background tasks.
- • You want the simplest possible setup and plan to optimize later.
Step 5: Image Generation (Optional)
Quilltap can generate images directly within chats. Image generation profiles are configured separately
from chat connections, in The Lantern (/foundry/lantern).
Setting Up Image Generation
- 1.
Navigate to The Foundry → The Lantern.
- 2.
Click Add Image Profile (or find it under Image Profiles).
- 3.
Choose a provider:
- • Google Gemini — Imagen 4 and Gemini image models
- • Grok (xAI) — Native image generation
- • OpenAI — DALL-E models
- • OpenRouter — Models with image generation support (e.g., Gemini via OpenRouter)
- 4.
Select the appropriate API key and model.
- 5.
Configure provider-specific controls (quality, style, safety settings, aspect ratios).
- 6.
Save the profile.
Using Image Generation in Chats
Once configured, you can launch the image generator from any chat. Quilltap can optionally use the
Cheap LLM to expand your prompts — injecting character and persona descriptions using
placeholders like {{Character}} and {{me}} —
before sending them to the image provider.
Settings Cascade
Many Quilltap settings follow a cascade where more specific levels override more general ones:
For example, you might set a default connection profile globally, override it for a specific character who responds better with a different model, and override it again for a particular chat session. Each level inherits from the one above unless explicitly changed. This applies to connection profile selection, agent mode settings, and other per-chat configurations.
Security
Quilltap takes API key security seriously:
- • All API keys are encrypted at rest using AES-256-GCM encryption.
- • Each user has their own encryption key derived from their user ID plus a master pepper (the
ENCRYPTION_MASTER_PEPPERenvironment variable set during deployment). - • Keys are decrypted only when the authenticated user makes a request that needs them — they're never exposed in logs, API responses, or database queries.
- • API keys are never sent anywhere except to the provider's own API endpoint when making a request on your behalf.
For Quilltap administrators: Back up your
ENCRYPTION_MASTER_PEPPER securely. If it's lost, all encrypted API keys become
unrecoverable and every user will need to re-enter their credentials.
Troubleshooting
"No connection profile configured"
You need to create a connection profile and either set it as the default or assign it to the character you're trying to chat with. See Step 2.
"API key invalid" or "Authentication failed"
Double-check that you copied the entire key without extra spaces. Verify the key hasn't been revoked or expired by checking your provider's dashboard. Confirm your account has available credits. For OpenAI-compatible servers, verify the base URL is correct.
"No models available" when creating a profile
For cloud providers, your API key may be invalid or out of credits. For Ollama, make sure it's
running — test with curl http://localhost:11434/api/tags. For
OpenAI-compatible endpoints, confirm the server is running and the URL is correct.
"API key invalid" after restoring a backup
API keys are encrypted with your instance's ENCRYPTION_MASTER_PEPPER. If you
restore a backup to a different instance with a different pepper, the keys can't be decrypted.
You'll need to re-enter all API keys.
Streaming stops mid-response
Check your provider's rate limits. For Ollama, this can indicate insufficient VRAM — try a smaller model. Network interruptions can also cause stream drops; the message saves whatever was received before the interruption.
Memory search returns poor results
If you're using the built-in TF-IDF embeddings and results aren't great, consider upgrading to
an external embedding provider (OpenAI or local Ollama with nomic-embed-text). If
external embeddings are configured but results are still poor, try refreshing the vocabulary or
re-indexing memories from The Commonplace Book.
Cheap LLM tasks failing
Verify that the profile designated for Cheap LLM tasks is working by testing it in a regular chat first. If using Ollama, make sure the service is running and has a model loaded.
"Rate limit exceeded"
Wait a few minutes and try again. Check your provider's rate limit documentation. Consider upgrading your plan or spreading requests across multiple API keys.
Quick Reference
| What You Need | Where to Find It |
|---|---|
| Add API Keys | The Forge → API Keys |
| Create Connection Profiles | The Forge → Connection Profiles |
| Configure Embeddings | The Commonplace Book |
| Set Up Cheap LLM | The Salon → Cheap LLM |
| Set Up Image Generation | The Lantern → Image Profiles |
| Monitor Background Tasks | Prospero → Tasks Queue |
| View AI Interaction Logs | Prospero → LLM Logs |