Built-In AI (Workers AI)
Vulcan apps can use Cloudflare Workers AI for on-device inference — no API key, no external service, no extra cost. You just describe the feature you want and the AI wires it up.
What's available
Text generation
Generate, summarize, classify, or transform text. Common uses:
- Summarize long documents or user inputs before storing them
- Auto-generate tags, categories, or labels for new records
- Draft email or message replies based on context
- Extract structured fields (name, date, amount) from free-form text
- Translate content between languages
Semantic search (embeddings)
Find records by meaning, not just exact keyword matches. Common uses:
- "Find anything about delivery delays" — even if the text uses different wording
- Surface similar items or related records
- Power Q&A over a knowledge base
Image understanding (vision)
Extract text and structured data from images. Common uses:
- Scan receipts and extract vendor, date, total, and line items
- Read handwritten forms or printed labels
- Identify objects or describe image content
Templates that use Workers AI
| Template | What it does |
|---|---|
| support-inbox | Auto-generates reply drafts for incoming support tickets |
| knowledge-base | Semantic search + AI-powered Q&A across your articles |
| document-intelligence | Field extraction, semantic search, image-to-text |
| content-pipeline | Auto-generates summary, category, and keyword tags |
| receipt-scanner | Extracts structured data from receipt photos |
How to add AI to your app
Just describe what you want:
"When a new ticket comes in, generate a suggested reply" "Let users search the knowledge base by asking questions in plain English" "Scan uploaded photos and extract the vendor name and total automatically"
The AI adds the Workers AI component and wires it to your feature. No setup required.
Workers AI vs Claude (Anthropic)
Vulcan has two AI options for building intelligent features into your apps:
| Workers AI | Claude (Anthropic) | |
|---|---|---|
| Cost | Free — billed through Veho's Cloudflare account | Requires an Anthropic API key (secret) |
| Latency | Fast for most tasks — 100–500ms typical | Slower — 1–5s for short responses, more for long ones |
| Quality | Good for focused tasks; less capable on open-ended reasoning | Significantly higher quality, especially for complex or nuanced tasks |
| Context window | Small (typically 512–4,096 tokens depending on model) | Large (up to 200K tokens) |
| Streaming | Not supported | Supported — responses stream token by token |
| Setup | Nothing — built into the platform | Requires storing an Anthropic API key as a secret |
Use Workers AI when:
- The task is well-defined and repetitive (classify this, summarize that, extract these fields)
- You don't want to deal with API keys or external billing
- Latency needs to be low and the task is straightforward
Use Claude when:
- You need a full chat interface with back-and-forth conversation
- The task requires reasoning, nuance, or following complex instructions
- You need a large context (processing long documents, analyzing multiple files)
- Output quality matters more than speed or cost
Available models
The AI selects the right model for the task automatically. You don't need to configure anything.
| Task | Model | Notes |
|---|---|---|
| Text generation (general) | Llama 3.1 8B | Good quality for summarization, extraction, classification |
| Fast text tasks | Mistral 7B | Faster; slightly lower quality |
| Embeddings / semantic search | BGE Small EN | Optimized for similarity search |
| Image understanding | LLaVA 1.5 | Extracts text and describes image content |
Performance expectations
Workers AI runs on Cloudflare's edge infrastructure, not a GPU cluster. Inference times vary by task and input length:
| Task | Typical latency |
|---|---|
| Short classification (< 100 tokens) | 50–150ms |
| Summarization (< 500 token input) | 200–600ms |
| Field extraction | 200–500ms |
| Embedding generation (per item) | 30–100ms |
| Image analysis | 500–1,500ms |
For tasks that run in the background (cron jobs, post-submit processing), this latency is invisible to users. For real-time tasks (on-submit inference, search-as-you-type), keep the input short and consider whether the delay is acceptable before deploying.
Limitations
Context window is small. Workers AI models accept roughly 512–4,096 tokens of input depending on the model. For reference, 512 tokens is about 400 words. Long documents need to be chunked before being processed — the AI handles this automatically when you ask for it.
Output quality is lower than Claude. Workers AI models are smaller and optimized for speed at the edge. They're well-suited for structured, well-defined tasks (extract these five fields, classify into these categories) but less reliable on open-ended generation, nuanced writing, or tasks that require multi-step reasoning. If output quality is inconsistent, consider switching to Claude for that feature.
No streaming. Workers AI responses are returned all at once — there's no token-by-token streaming. This means users see a spinner until the full result is ready, rather than watching the response arrive progressively.
Not suitable for conversational chat. The models don't maintain conversation history. Each call is stateless. For a back-and-forth chat interface, use the Streaming AI Chat component powered by Claude instead.
Accuracy is not guaranteed. All AI-generated content — summaries, extracted fields, classifications — can be wrong. For anything user-facing or consequential (e.g., "auto-approve invoices under $100 using extracted totals"), build in a review step or human confirmation.
Tips for better results
Be specific about the output format. The AI generates better prompts when you tell it what structure you want:
"Extract the vendor name, invoice date, and total amount as a JSON object with keys
vendor,date, andtotal"
is more reliable than:
"Extract the invoice details"
Keep inputs focused. Trim noise before sending to AI — if you're summarizing a support ticket, strip email headers and quoted replies first. Cleaner input → cleaner output.
Test edge cases before deploying. AI extraction and classification can fail silently — returning null, an empty string, or a plausible-looking but wrong value. Add a fallback in your app for when the AI returns something unexpected:
"If the extraction returns null for any field, show a warning and ask the user to fill it in manually"
Use structured output for extraction tasks. When you need specific fields, ask the AI to return JSON. The models are more reliable at filling in a defined schema than generating free-form text with the right fields embedded.
Combining AI components
AI components work well together. Some common combinations:
- Field Extraction + SQL database — extract structured data from uploaded documents and store it in a queryable table
- Persistent Vector Search + Knowledge Base — index articles once, search by meaning on every query
- Content Moderation + Support Inbox — filter toxic messages before they reach the support queue
- Image Analysis + File Storage — upload a photo, extract data, store results in D1