Skip to content

Built-In AI (Workers AI)

Vulcan apps can use Cloudflare Workers AI for on-device inference — no API key, no external service, no extra cost. You just describe the feature you want and the AI wires it up.


What's available

Text generation

Generate, summarize, classify, or transform text. Common uses:

  • Summarize long documents or user inputs before storing them
  • Auto-generate tags, categories, or labels for new records
  • Draft email or message replies based on context
  • Extract structured fields (name, date, amount) from free-form text
  • Translate content between languages

Semantic search (embeddings)

Find records by meaning, not just exact keyword matches. Common uses:

  • "Find anything about delivery delays" — even if the text uses different wording
  • Surface similar items or related records
  • Power Q&A over a knowledge base

Image understanding (vision)

Extract text and structured data from images. Common uses:

  • Scan receipts and extract vendor, date, total, and line items
  • Read handwritten forms or printed labels
  • Identify objects or describe image content

Templates that use Workers AI

TemplateWhat it does
support-inboxAuto-generates reply drafts for incoming support tickets
knowledge-baseSemantic search + AI-powered Q&A across your articles
document-intelligenceField extraction, semantic search, image-to-text
content-pipelineAuto-generates summary, category, and keyword tags
receipt-scannerExtracts structured data from receipt photos

How to add AI to your app

Just describe what you want:

"When a new ticket comes in, generate a suggested reply" "Let users search the knowledge base by asking questions in plain English" "Scan uploaded photos and extract the vendor name and total automatically"

The AI adds the Workers AI component and wires it to your feature. No setup required.


Workers AI vs Claude (Anthropic)

Vulcan has two AI options for building intelligent features into your apps:

Workers AIClaude (Anthropic)
CostFree — billed through Veho's Cloudflare accountRequires an Anthropic API key (secret)
LatencyFast for most tasks — 100–500ms typicalSlower — 1–5s for short responses, more for long ones
QualityGood for focused tasks; less capable on open-ended reasoningSignificantly higher quality, especially for complex or nuanced tasks
Context windowSmall (typically 512–4,096 tokens depending on model)Large (up to 200K tokens)
StreamingNot supportedSupported — responses stream token by token
SetupNothing — built into the platformRequires storing an Anthropic API key as a secret

Use Workers AI when:

  • The task is well-defined and repetitive (classify this, summarize that, extract these fields)
  • You don't want to deal with API keys or external billing
  • Latency needs to be low and the task is straightforward

Use Claude when:

  • You need a full chat interface with back-and-forth conversation
  • The task requires reasoning, nuance, or following complex instructions
  • You need a large context (processing long documents, analyzing multiple files)
  • Output quality matters more than speed or cost

Available models

The AI selects the right model for the task automatically. You don't need to configure anything.

TaskModelNotes
Text generation (general)Llama 3.1 8BGood quality for summarization, extraction, classification
Fast text tasksMistral 7BFaster; slightly lower quality
Embeddings / semantic searchBGE Small ENOptimized for similarity search
Image understandingLLaVA 1.5Extracts text and describes image content

Performance expectations

Workers AI runs on Cloudflare's edge infrastructure, not a GPU cluster. Inference times vary by task and input length:

TaskTypical latency
Short classification (< 100 tokens)50–150ms
Summarization (< 500 token input)200–600ms
Field extraction200–500ms
Embedding generation (per item)30–100ms
Image analysis500–1,500ms

For tasks that run in the background (cron jobs, post-submit processing), this latency is invisible to users. For real-time tasks (on-submit inference, search-as-you-type), keep the input short and consider whether the delay is acceptable before deploying.


Limitations

Context window is small. Workers AI models accept roughly 512–4,096 tokens of input depending on the model. For reference, 512 tokens is about 400 words. Long documents need to be chunked before being processed — the AI handles this automatically when you ask for it.

Output quality is lower than Claude. Workers AI models are smaller and optimized for speed at the edge. They're well-suited for structured, well-defined tasks (extract these five fields, classify into these categories) but less reliable on open-ended generation, nuanced writing, or tasks that require multi-step reasoning. If output quality is inconsistent, consider switching to Claude for that feature.

No streaming. Workers AI responses are returned all at once — there's no token-by-token streaming. This means users see a spinner until the full result is ready, rather than watching the response arrive progressively.

Not suitable for conversational chat. The models don't maintain conversation history. Each call is stateless. For a back-and-forth chat interface, use the Streaming AI Chat component powered by Claude instead.

Accuracy is not guaranteed. All AI-generated content — summaries, extracted fields, classifications — can be wrong. For anything user-facing or consequential (e.g., "auto-approve invoices under $100 using extracted totals"), build in a review step or human confirmation.


Tips for better results

Be specific about the output format. The AI generates better prompts when you tell it what structure you want:

"Extract the vendor name, invoice date, and total amount as a JSON object with keys vendor, date, and total"

is more reliable than:

"Extract the invoice details"

Keep inputs focused. Trim noise before sending to AI — if you're summarizing a support ticket, strip email headers and quoted replies first. Cleaner input → cleaner output.

Test edge cases before deploying. AI extraction and classification can fail silently — returning null, an empty string, or a plausible-looking but wrong value. Add a fallback in your app for when the AI returns something unexpected:

"If the extraction returns null for any field, show a warning and ask the user to fill it in manually"

Use structured output for extraction tasks. When you need specific fields, ask the AI to return JSON. The models are more reliable at filling in a defined schema than generating free-form text with the right fields embedded.


Combining AI components

AI components work well together. Some common combinations:

  • Field Extraction + SQL database — extract structured data from uploaded documents and store it in a queryable table
  • Persistent Vector Search + Knowledge Base — index articles once, search by meaning on every query
  • Content Moderation + Support Inbox — filter toxic messages before they reach the support queue
  • Image Analysis + File Storage — upload a photo, extract data, store results in D1

Full AI component catalog

Built by the Veho Developer Platform team