Built-In AI (Workers AI)

Vulcan apps can use Cloudflare Workers AI for on-device inference — no API key, no external service, no extra cost. You just describe the feature you want and the AI wires it up.

What's available

Text generation

Generate, summarize, classify, or transform text. Common uses:

Summarize long documents or user inputs before storing them
Auto-generate tags, categories, or labels for new records
Draft email or message replies based on context
Extract structured fields (name, date, amount) from free-form text
Translate content between languages

Semantic search (embeddings)

Find records by meaning, not just exact keyword matches. Common uses:

"Find anything about delivery delays" — even if the text uses different wording
Surface similar items or related records
Power Q&A over a knowledge base

Image understanding (vision)

Extract text and structured data from images. Common uses:

Scan receipts and extract vendor, date, total, and line items
Read handwritten forms or printed labels
Identify objects or describe image content

Templates that use Workers AI

Template	What it does
support-inbox	Auto-generates reply drafts for incoming support tickets
knowledge-base	Semantic search + AI-powered Q&A across your articles
document-intelligence	Field extraction, semantic search, image-to-text
content-pipeline	Auto-generates summary, category, and keyword tags
receipt-scanner	Extracts structured data from receipt photos

How to add AI to your app

Just describe what you want:

"When a new ticket comes in, generate a suggested reply" "Let users search the knowledge base by asking questions in plain English" "Scan uploaded photos and extract the vendor name and total automatically"

The AI adds the Workers AI component and wires it to your feature. No setup required.

Workers AI vs Claude (Anthropic)

Vulcan has two AI options for building intelligent features into your apps:

	Workers AI	Claude (Anthropic)
Cost	Free — billed through Veho's Cloudflare account	Requires an Anthropic API key (secret)
Latency	Fast for most tasks — 100–500ms typical	Slower — 1–5s for short responses, more for long ones
Quality	Good for focused tasks; less capable on open-ended reasoning	Significantly higher quality, especially for complex or nuanced tasks
Context window	Small (typically 512–4,096 tokens depending on model)	Large (up to 200K tokens)
Streaming	Not supported	Supported — responses stream token by token
Setup	Nothing — built into the platform	Requires storing an Anthropic API key as a secret

Use Workers AI when:

The task is well-defined and repetitive (classify this, summarize that, extract these fields)
You don't want to deal with API keys or external billing
Latency needs to be low and the task is straightforward

Use Claude when:

You need a full chat interface with back-and-forth conversation
The task requires reasoning, nuance, or following complex instructions
You need a large context (processing long documents, analyzing multiple files)
Output quality matters more than speed or cost

Available models

The AI selects the right model for the task automatically. You don't need to configure anything.

Task	Model	Notes
Text generation (general)	Llama 3.1 8B	Good quality for summarization, extraction, classification
Fast text tasks	Mistral 7B	Faster; slightly lower quality
Embeddings / semantic search	BGE Small EN	Optimized for similarity search
Image understanding	LLaVA 1.5	Extracts text and describes image content

Performance expectations

Workers AI runs on Cloudflare's edge infrastructure, not a GPU cluster. Inference times vary by task and input length:

Task	Typical latency
Short classification (< 100 tokens)	50–150ms
Summarization (< 500 token input)	200–600ms
Field extraction	200–500ms
Embedding generation (per item)	30–100ms
Image analysis	500–1,500ms

For tasks that run in the background (cron jobs, post-submit processing), this latency is invisible to users. For real-time tasks (on-submit inference, search-as-you-type), keep the input short and consider whether the delay is acceptable before deploying.

Limitations

Context window is small. Workers AI models accept roughly 512–4,096 tokens of input depending on the model. For reference, 512 tokens is about 400 words. Long documents need to be chunked before being processed — the AI handles this automatically when you ask for it.

Output quality is lower than Claude. Workers AI models are smaller and optimized for speed at the edge. They're well-suited for structured, well-defined tasks (extract these five fields, classify into these categories) but less reliable on open-ended generation, nuanced writing, or tasks that require multi-step reasoning. If output quality is inconsistent, consider switching to Claude for that feature.

No streaming. Workers AI responses are returned all at once — there's no token-by-token streaming. This means users see a spinner until the full result is ready, rather than watching the response arrive progressively.

Not suitable for conversational chat. The models don't maintain conversation history. Each call is stateless. For a back-and-forth chat interface, use the Streaming AI Chat component powered by Claude instead.

Accuracy is not guaranteed. All AI-generated content — summaries, extracted fields, classifications — can be wrong. For anything user-facing or consequential (e.g., "auto-approve invoices under $100 using extracted totals"), build in a review step or human confirmation.

Tips for better results

Be specific about the output format. The AI generates better prompts when you tell it what structure you want:

"Extract the vendor name, invoice date, and total amount as a JSON object with keys vendor, date, and total"

is more reliable than:

"Extract the invoice details"

Keep inputs focused. Trim noise before sending to AI — if you're summarizing a support ticket, strip email headers and quoted replies first. Cleaner input → cleaner output.

Test edge cases before deploying. AI extraction and classification can fail silently — returning null, an empty string, or a plausible-looking but wrong value. Add a fallback in your app for when the AI returns something unexpected:

"If the extraction returns null for any field, show a warning and ask the user to fill it in manually"

Use structured output for extraction tasks. When you need specific fields, ask the AI to return JSON. The models are more reliable at filling in a defined schema than generating free-form text with the right fields embedded.

Combining AI components

AI components work well together. Some common combinations:

Field Extraction + SQL database — extract structured data from uploaded documents and store it in a queryable table
Persistent Vector Search + Knowledge Base — index articles once, search by meaning on every query
Content Moderation + Support Inbox — filter toxic messages before they reach the support queue
Image Analysis + File Storage — upload a photo, extract data, store results in D1

→ Full AI component catalog

Built-In AI (Workers AI) ​

What's available ​

Text generation ​

Semantic search (embeddings) ​

Image understanding (vision) ​

Templates that use Workers AI ​

How to add AI to your app ​

Workers AI vs Claude (Anthropic) ​

Available models ​

Performance expectations ​

Limitations ​

Tips for better results ​

Combining AI components ​