Introduction
Sentinel is a fully autonomous SRE (Site Reliability Engineering) pipeline built entirely on Cloudflare’s Developer Platform. It watches your Cloudflare Workers for errors and fixes them — without human intervention.
The Problem
Section titled “The Problem”When a Worker starts throwing 5xx errors at 3 AM, the typical response is:
- An alert fires
- A human wakes up, reads logs, reproduces the issue
- They identify the root cause, write a fix, run tests
- They submit a PR, wait for review, merge, deploy
Steps 2–4 can take hours. Sentinel automates all of them.
How It Works
Section titled “How It Works”Sentinel runs as a set of six Durable Object agents on Cloudflare Workers, connected through Queues:
| Agent | Role |
|---|---|
| LogTailer | Polls Workers Observability every 30s, fingerprints errors, deduplicates |
| TestGen | Generates a bun:test reproduction test via LLM, runs it in a Sandbox |
| CodeTriage | Reads source code, performs multi-turn LLM root cause analysis |
| FixAgent | TDD cycle: confirm bug → generate fix → verify → regression check → PR |
| Orchestrator | Dashboard API for monitoring and manual overrides |
| TracingAgent | (Phase 3) Extends pipeline to non-Cloudflare origins |
Human-in-the-Loop
Section titled “Human-in-the-Loop”Sentinel is not fully autonomous end-to-end. The pipeline stops at the PR boundary:
- Sentinel creates a GitHub Pull Request with a detailed description
- A human reviews and merges (or rejects) the PR
- If an agent lacks confidence, the incident is marked
needs_human
This design ensures that AI-generated code is always reviewed before reaching production.
What Gets Fixed
Section titled “What Gets Fixed”Sentinel currently targets:
- 5xx server errors — uncaught exceptions, timeout errors, resource exhaustion
- 4xx client errors — malformed route handlers, missing validation, incorrect status codes
- Unhandled exceptions — promise rejections, type errors, null reference errors
Each error is fingerprinted using SHA-256 over normalized error messages, route patterns, and stack traces. This means the same root cause generates the same fingerprint regardless of dynamic values like timestamps, UUIDs, or request IDs.
Platform Requirements
Section titled “Platform Requirements”Sentinel runs entirely on Cloudflare:
- Workers (compute) — all agents run as Durable Objects
- D1 (database) — incident tracking, audit trail, configuration
- R2 (storage) — test cases, patches, PR descriptions
- Queues (messaging) — reliable inter-agent communication with retries and DLQ
- Sandbox (containers) — isolated execution of untrusted code
- Workers AI (inference) — test generation, root cause analysis, fix generation