Introduction

Sentinel is a fully autonomous SRE (Site Reliability Engineering) pipeline built entirely on Cloudflare’s Developer Platform. It watches your Cloudflare Workers for errors and fixes them — without human intervention.

The Problem

When a Worker starts throwing 5xx errors at 3 AM, the typical response is:

An alert fires
A human wakes up, reads logs, reproduces the issue
They identify the root cause, write a fix, run tests
They submit a PR, wait for review, merge, deploy

Steps 2–4 can take hours. Sentinel automates all of them.

How It Works

Sentinel runs as a set of six Durable Object agents on Cloudflare Workers, connected through Queues:

Agent	Role
LogTailer	Polls Workers Observability every 30s, fingerprints errors, deduplicates
TestGen	Generates a `bun:test` reproduction test via LLM, runs it in a Sandbox
CodeTriage	Reads source code, performs multi-turn LLM root cause analysis
FixAgent	TDD cycle: confirm bug → generate fix → verify → regression check → PR
Orchestrator	Dashboard API for monitoring and manual overrides
TracingAgent	(Phase 3) Extends pipeline to non-Cloudflare origins

Human-in-the-Loop

Sentinel is not fully autonomous end-to-end. The pipeline stops at the PR boundary:

Sentinel creates a GitHub Pull Request with a detailed description
A human reviews and merges (or rejects) the PR
If an agent lacks confidence, the incident is marked needs_human

This design ensures that AI-generated code is always reviewed before reaching production.

What Gets Fixed

Sentinel currently targets:

5xx server errors — uncaught exceptions, timeout errors, resource exhaustion
4xx client errors — malformed route handlers, missing validation, incorrect status codes
Unhandled exceptions — promise rejections, type errors, null reference errors

Each error is fingerprinted using SHA-256 over normalized error messages, route patterns, and stack traces. This means the same root cause generates the same fingerprint regardless of dynamic values like timestamps, UUIDs, or request IDs.

Platform Requirements

Sentinel runs entirely on Cloudflare:

Workers (compute) — all agents run as Durable Objects
D1 (database) — incident tracking, audit trail, configuration
R2 (storage) — test cases, patches, PR descriptions
Queues (messaging) — reliable inter-agent communication with retries and DLQ
Sandbox (containers) — isolated execution of untrusted code
Workers AI (inference) — test generation, root cause analysis, fix generation