Security, Guardrails & Operations
Philosophy: Fail Safe, Not Silent
Section titled “Philosophy: Fail Safe, Not Silent”Revenue Guard operates under one principle: if something goes wrong, fail loudly and stop the bleed immediately.
When budgets are at risk or traffic is suspicious, the system doesn’t guess. It blocks. It logs. It alerts. This is the opposite of “let’s scale and deal with overages later.”
Defence in Depth
Section titled “Defence in Depth”Each layer catches a different class of abuse:
- Turnstile (Entry): No session without a Turnstile challenge. Stops bots before they hit any API.
- Rate limits (Per-session): 200/min allocations per session (per IP). Stops single-user abuse.
- Guardrails (Budget): Virtual spend cap. Once hit, allocations are blocked until reset. Stops runaway scenarios.
- Validation (Payload): Strict checks on SKU/user/mode. Rejects malformed requests before processing.
[!TIP] The demo’s DEMO_COST_LIMIT=0.0 means billing is literally locked to zero. Real spend can’t happen. This is the trust mechanism: “You can throw anything at this demo, and it won’t cost money.”
Guardrail Flow (How Rejection Works)
Section titled “Guardrail Flow (How Rejection Works)”graph TD
Start(["Allocation Request"]) --> CheckGuard{"Guardrail<br/>tripped?"}
CheckGuard -- Yes --> Block["403 Forbidden<br/>guardrailTriggered: true"]
CheckGuard -- No --> Mode{"mode=safe?"}
Mode -- Safe --> DOAlloc["Durable Object<br/>allocate()"]
Mode -- Eventual --> D1Alloc["D1 allocate<br/>SELECT+UPDATE"]
DOAlloc --> Validate{"Request<br/>with valid<br/>SKU/user?"}
D1Alloc --> Validate
Validate -- No --> Reject["400 Bad Request<br/>error: INVALID_SKU"]
Validate -- Yes --> Check2{"Does allocation<br/>exceed budget?"}
Check2 -- Yes --> Trip["Set guardrail<br/>flag in KV"]
Trip --> Block
Check2 -- No --> Respond["200 OK<br/>allocated: true"]
Respond --> Log["Emit to<br/>Analytics Engine"]
Reading the flow:
- Request arrives
- Check if guardrail is already tripped → if yes, block immediately (403)
- Route to safe or eventual path based on
modeparameter - Validate payload (SKU exists, user ID is UUID, etc.)
- Execute allocation logic
- Check if total cost/units now exceed budget
- If yes, trip the guardrail (future requests will hit step 2)
- If no, respond with success and emit telemetry
Failure Scenarios
Section titled “Failure Scenarios”What goes wrong, and what happens:
| Scenario | Symptom | System Response | Ops Action |
|---|---|---|---|
| DO crashes mid-request | durable_object_error in logs | Auto-restart in 30s; in-flight requests fail with 500 | Wait for recovery; if persistent, rollback to SQL-only |
| D1 query quota exceeded | database quota exceeded error | Allocations fail; guardrail trips | Delete old allocations; VACUUM D1; raise quota if needed |
| Turnstile token invalid | unauthorised response | Request rejected at entry (403) | Confirm Turnstile is configured; use DEBUG_TOKEN in dev |
| Rate limit hit | 429 Too Many Requests | Response includes Retry-After header | Whitelist demo IP if legitimate; otherwise investigate abuse |
| KV quota exceeded | kv_namespace_full | Session mirrors can’t persist; no WS recovery | Delete entries older than 24h; increase KV quota |
| Guardrail stuck (set but shouldn’t be) | Allocations blocked mysteriously | Check KV: wrangler kv:key get --namespace-id=XXX guardrail_status | If stuck, manually clear: wrangler kv:key delete guardrail_status |
Telemetry Channels
Section titled “Telemetry Channels”Revenue Guard emits data through three streams:
-
Analytics Engine (primary): Structured events—allocation attempts, outcomes, guardrail triggers, latency breakdowns. Queryable, aggregatable, designed for trends.
allocation_event: {skuId, userId, mode (safe|eventual),success (true|false), latency_ms,guardrail_triggered, cost_delta} -
Logs (immediate):
wrangler tailshows real-time activity. Grep for errors, allocate calls, guardrail state changes. Human-readable, not indexed, good for live debugging.Terminal window wrangler tail --env production --filter allocate --grep "ERROR|GUARDRAIL" -
KV Counters (state): Rate limit keys, guardrail flag, session snapshots. Fast for checks, not queryable like Analytics Engine.
Rate Limiting (Per IP, Per Session)
Section titled “Rate Limiting (Per IP, Per Session)”allocate: 200 requests/minute per IPreset: 1 request/minute per IPstate: unlimited (read-only)Why these numbers? A genuine user clicks “buy” 1–3 times per second (under panic). 200/minute handles typos, retries, and accidental double-clicks. Scripts abuse rate limits by hitting 1000+/min; we catch them at 200.
Data Residency & Compliance
Section titled “Data Residency & Compliance”- KV + DO can be pinned to a region via
location_hint(e.g.,eufor GDPR compliance). In demo mode, this is optional. In production, respect data gravity. - Only metadata is stored: sessionId, IP, timestamp, event type. No customer names, emails, or payment data.
- GDPR deletion: KV entries auto-expire after 20 minutes (session length). Longer history? Store in cold storage, not hot path.
Non-Goals (What We Don’t Do)
Section titled “Non-Goals (What We Don’t Do)”[!NOTE] Revenue Guard prioritizes safety and speed over exhaustiveness:
- No payment processing: Allocation only; payments are downstream.
- No user auth: Turnstile proves “not a bot,” not “authenticated user.” Pair with your auth system.
- No encryption in flight: Relies on Cloudflare’s edge encryption. Add app-layer encryption if PCI compliance requires it.
- No audit trail to cold storage: Analytics Engine events expire after 30 days. Export to Sentry/DataDog/BigQuery if you need long-term audit compliance.
References
Section titled “References”- Tech Stack & Security: Under the Hood
- Rate limit details: See repository README
- Business impact & risk: Business Impact & ROI
- Cloudflare Turnstile: https://developers.cloudflare.com/turnstile/
- Bot Management: https://www.cloudflare.com/bm/