Production Monitoring

Once your app is deployed to production, the Production tab in the IDE workspace shows live metrics from Cloudflare. This page explains what each metric means, what normal looks like, and how to investigate when something goes wrong.

How to open it

Inside your project in the IDE, click the Production tab in the workspace panel. It only appears after at least one production deploy.

Use the time window selector in the top-right to switch between 1h, 6h, 24h (default), and 7d views. Narrowing the window helps isolate a spike; widening it shows trends.

Traffic metrics

Requests

Total HTTP requests received by your app's Worker in the selected window. This includes all routes — API calls, page loads, static assets, background job triggers, etc.

What's normal: Depends entirely on your app and how many people use it. A personal tool used by one person might see 50–200 requests/day. A team dashboard could see thousands.

Watch for: An unexpected spike (could be a polling loop, a broken cron job, or load from a new user) or a sudden drop to zero (your app may be down or broken in a way that prevents requests from reaching it).

Errors

Requests that returned a 5xx server error. These are failures in your app's backend — unhandled exceptions, crashes, missing variables, timeouts.

What's normal: Zero, or very close to it. A well-functioning app should have an error rate well under 1%.

Watch for: Any sustained non-zero error count. Even a handful of 5xx responses is worth investigating, especially if users are actively reporting problems.

Error rate

Errors as a percentage of total requests. This normalizes error counts against traffic volume — useful for distinguishing "we had 10 errors" from "10% of requests failed."

What's normal: Below 0.5% is healthy. Above 1% indicates something that needs attention.

What 100% error rate means: Your app is crashing on every request — usually a missing secret, a broken import, or an unhandled top-level exception in your Worker.

Performance metrics

CPU time

How long the Worker's CPU was actively executing code per request. Reported as:

p50 — the median request (half of requests finished faster than this)
p99 — the slowest 1% of requests

What's normal:

App type	p50	p99
Simple CRUD (KV or D1)	< 5ms	< 20ms
With API calls to external services	< 50ms	< 200ms
With AI inference (Workers AI)	< 500ms	< 2,000ms

Watch for: p99 creeping above 1,000ms for non-AI apps — that suggests a slow query, a blocking operation, or a hot path that needs optimization. Ask the AI: "The p99 CPU time is high — what might be causing slow requests?"

Duration

Wall-clock time from request received to response sent. Includes CPU time plus anything the Worker was waiting on — network round trips to external APIs, KV reads, D1 queries, AI inference.

Duration is always higher than CPU time. The gap between them is time spent waiting on I/O.

What's normal: 2–5× CPU time is typical when there's external I/O (KV, D1, API calls). A very large gap (e.g., 5ms CPU but 3,000ms duration) usually means a slow external dependency — an API timing out or a D1 query taking longer than expected.

Storage metrics

Only shown if your app uses that storage type.

KV

Reads — how many times your app read a value from KV storage
Writes — how many times your app wrote or updated a KV value
Deletes — how many KV values were removed
Lists — how many KV list operations ran (listing keys by prefix)

Watch for: An unexpected spike in writes or lists can indicate a runaway cron job or a polling loop that's writing on every tick instead of only on change.

D1 (SQL database)

Read queries — SELECT statements executed
Write queries — INSERT, UPDATE, DELETE statements executed
Rows read — total rows scanned (not just returned) by queries
Rows written — total rows inserted or updated

Watch for: High "rows read" relative to "rows returned" means your queries are doing full or near-full table scans. Ask the AI to add an index if this is happening on a frequently-called route.

R2 (file storage)

Gets — file download operations
Puts — file upload operations
Deletes — file removals

Watch for: High put counts you didn't expect — could mean files are being uploaded repeatedly when they should be cached, or a job is regenerating the same file on every run.

Investigating a problem in production

High error rate

Open the Server Logs tab to see live error output — the actual exception message and stack trace will be there
Copy the error and paste it into chat: "Users are seeing errors in production. Here's what the server log shows: [paste]"
If the problem started after a specific deploy, consider rolling back via Versions & Rollback while you fix the root cause

High CPU time or duration

Check whether you recently added a new feature that makes external API calls or runs heavy queries
Ask the AI: "Response times spiked after the last change — what in the code could cause slow requests?"
For D1 performance: check if "rows read" is much higher than "rows written" — that usually means missing indexes on frequently-filtered columns

Unexpected traffic spike

Check if a cron job is misconfigured — a job set to * * * * * (every minute) instead of 0 * * * * (every hour) generates 60× the expected load
Look at whether the requests are concentrated on one route (visible in Server Logs) or spread across many
If the source is unclear, contact the platform team — they can inspect Cloudflare's request logs with more detail than the in-app metrics show

App appears down (zero requests or total 5xx)

Try opening your production URL directly — if you see a Cloudflare error page rather than your app, the Worker failed to start
Check Build Logs for any failed deploy that may have pushed broken code
If you need to recover quickly, roll back to the last known-good version

→ Versions & Rollback

Limits and what happens when you hit them

Metric	Platform limit	What you'll see
CPU time per request	30 seconds	503 / Worker timeout
Memory per request	128 MB	Worker crash, 5xx error
Requests per day (free tier)	100,000	Throttling (unlikely for internal tools)

Most internal tools run well within these limits. If you're hitting CPU time limits, the most common cause is AI inference or a very large data processing loop — ask the AI to break the work into smaller chunks.

Questions

Reach out in #eng_product_platform_team if you're seeing unexpected metrics or need help interpreting what you're looking at.

Production Monitoring ​

How to open it ​

Traffic metrics ​

Requests ​

Errors ​

Error rate ​

Performance metrics ​

CPU time ​

Duration ​

Storage metrics ​

KV ​

D1 (SQL database) ​

R2 (file storage) ​

Investigating a problem in production ​

High error rate ​

High CPU time or duration ​

Unexpected traffic spike ​

App appears down (zero requests or total 5xx) ​

Limits and what happens when you hit them ​

Questions ​