Troubleshooting
Incidents Stuck in “detected”
Section titled “Incidents Stuck in “detected””Symptom: Incidents are created but never progress to test_generating.
Causes:
- The
sentinel-errors-detectedqueue consumer is not running - The TestGen agent is failing silently
Resolution:
- Check Queues dashboard for backlog in
sentinel-errors-detected - Check DLQ (
sentinel-dlq) for failed messages - Check Workers logs for errors in the queue consumer
- Manually trigger via the Orchestrator API:
POST /api/incidents/:id/retry
Incidents Stuck in “needs_human”
Section titled “Incidents Stuck in “needs_human””Symptom: Many incidents are marked needs_human instead of progressing.
Common Causes:
| Agent | Reason |
|---|---|
| TestGen | Generated test passes (bug not reproduced) — review the test logic |
| CodeTriage | LLM confidence is low — check the source code context being sent |
| FixAgent | 5 fix attempts exhausted — review the LLM prompts and test output |
Resolution:
- Get the incident detail:
GET /api/incidents/:id - Review the R2 artifacts (test case, sample event)
- Fix the issue manually and retry:
POST /api/incidents/:id/retry
LLM Generating Poor Results
Section titled “LLM Generating Poor Results”Symptom: Test generation or fix generation produces invalid code.
Causes:
- Insufficient context in the LLM prompt
- The source code being read is too large or irrelevant
- Model limitations for complex codebases
Resolution:
- Check the
llm_callstable for the prompt and response - Review the source code being sent (currently reads
src/index.tsonly) - Consider adjusting the
maxTokensparameter in the LLM config
Queue Messages Failing
Section titled “Queue Messages Failing”Symptom: Messages appear in the DLQ.
Resolution:
- Check
sentinel-dlqfor the full message body - Verify the message matches the expected Zod schema
- Check the target agent’s HTTP endpoint for errors
- Messages are retried automatically (3x for errors-detected, 2x for fix-ready)
Sandbox Unavailable
Section titled “Sandbox Unavailable”Symptom: Agents log "no sandbox or repo" and mark incidents as needs_human.
Causes:
- The
SANDBOX(containers) binding is commented out inwrangler.jsonc - The sandbox container image hasn’t been built and pushed
Resolution:
- Build the sandbox image:
cd sandbox-image && docker build -t sentinel-sandbox . - Uncomment the containers binding in
wrangler.jsonc - Redeploy:
bunx wrangler@latest deploy
Database Issues
Section titled “Database Issues”Reset Local D1
Section titled “Reset Local D1”rm -rf .wrangler/state/v3/d1/bun run db:migrateQuery D1 Directly
Section titled “Query D1 Directly”bunx wrangler@latest d1 execute sentinel-db --command "SELECT COUNT(*) FROM incidents"Check Agent SQLite
Section titled “Check Agent SQLite”Agent-local SQLite data (fingerprint cache, poll cursors) is stored in the Durable Object’s storage. It can be inspected via the Durable Objects dashboard in the Cloudflare console.