Security, Guardrails & Operations

Philosophy: Fail Safe, Not Silent

Revenue Guard operates under one principle: if something goes wrong, fail loudly and stop the bleed immediately.

When budgets are at risk or traffic is suspicious, the system doesn’t guess. It blocks. It logs. It alerts. This is the opposite of “let’s scale and deal with overages later.”

Defence in Depth

Each layer catches a different class of abuse:

Turnstile (Entry): No session without a Turnstile challenge. Stops bots before they hit any API.
Rate limits (Per-session): 200/min allocations per session (per IP). Stops single-user abuse.
Guardrails (Budget): Virtual spend cap. Once hit, allocations are blocked until reset. Stops runaway scenarios.
Validation (Payload): Strict checks on SKU/user/mode. Rejects malformed requests before processing.

[!TIP] The demo’s DEMO_COST_LIMIT=0.0 means billing is literally locked to zero. Real spend can’t happen. This is the trust mechanism: “You can throw anything at this demo, and it won’t cost money.”

Guardrail Flow (How Rejection Works)

graph TD
  Start(["Allocation Request"]) --> CheckGuard{"Guardrail<br/>tripped?"}
  CheckGuard -- Yes --> Block["403 Forbidden<br/>guardrailTriggered: true"]
  CheckGuard -- No --> Mode{"mode=safe?"}
  Mode -- Safe --> DOAlloc["Durable Object<br/>allocate()"]
  Mode -- Eventual --> D1Alloc["D1 allocate<br/>SELECT+UPDATE"]
  DOAlloc --> Validate{"Request<br/>with valid<br/>SKU/user?"}
  D1Alloc --> Validate
  Validate -- No --> Reject["400 Bad Request<br/>error: INVALID_SKU"]
  Validate -- Yes --> Check2{"Does allocation<br/>exceed budget?"}
  Check2 -- Yes --> Trip["Set guardrail<br/>flag in KV"]
  Trip --> Block
  Check2 -- No --> Respond["200 OK<br/>allocated: true"]
  Respond --> Log["Emit to<br/>Analytics Engine"]

Reading the flow:

Request arrives
Check if guardrail is already tripped → if yes, block immediately (403)
Route to safe or eventual path based on mode parameter
Validate payload (SKU exists, user ID is UUID, etc.)
Execute allocation logic
Check if total cost/units now exceed budget
If yes, trip the guardrail (future requests will hit step 2)
If no, respond with success and emit telemetry

Failure Scenarios

What goes wrong, and what happens:

Scenario	Symptom	System Response	Ops Action
DO crashes mid-request	`durable_object_error` in logs	Auto-restart in 30s; in-flight requests fail with 500	Wait for recovery; if persistent, rollback to SQL-only
D1 query quota exceeded	`database quota exceeded` error	Allocations fail; guardrail trips	Delete old allocations; VACUUM D1; raise quota if needed
Turnstile token invalid	`unauthorised` response	Request rejected at entry (403)	Confirm Turnstile is configured; use DEBUG_TOKEN in dev
Rate limit hit	`429 Too Many Requests`	Response includes `Retry-After` header	Whitelist demo IP if legitimate; otherwise investigate abuse
KV quota exceeded	`kv_namespace_full`	Session mirrors can’t persist; no WS recovery	Delete entries older than 24h; increase KV quota
Guardrail stuck (set but shouldn’t be)	Allocations blocked mysteriously	Check KV: `wrangler kv:key get --namespace-id=XXX guardrail_status`	If stuck, manually clear: `wrangler kv:key delete guardrail_status`

Telemetry Channels

Revenue Guard emits data through three streams:

Analytics Engine (primary): Structured events—allocation attempts, outcomes, guardrail triggers, latency breakdowns. Queryable, aggregatable, designed for trends.
```
allocation_event: {
  skuId, userId, mode (safe|eventual),
  success (true|false), latency_ms,
  guardrail_triggered, cost_delta
}
```
Logs (immediate): wrangler tail shows real-time activity. Grep for errors, allocate calls, guardrail state changes. Human-readable, not indexed, good for live debugging.
Terminal window
```
wrangler tail --env production --filter allocate --grep "ERROR|GUARDRAIL"
```
KV Counters (state): Rate limit keys, guardrail flag, session snapshots. Fast for checks, not queryable like Analytics Engine.

Rate Limiting (Per IP, Per Session)

allocate: 200 requests/minute per IP
reset:     1 request/minute per IP
state:   unlimited (read-only)

Why these numbers? A genuine user clicks “buy” 1–3 times per second (under panic). 200/minute handles typos, retries, and accidental double-clicks. Scripts abuse rate limits by hitting 1000+/min; we catch them at 200.

Data Residency & Compliance

KV + DO can be pinned to a region via location_hint (e.g., eu for GDPR compliance). In demo mode, this is optional. In production, respect data gravity.
Only metadata is stored: sessionId, IP, timestamp, event type. No customer names, emails, or payment data.
GDPR deletion: KV entries auto-expire after 20 minutes (session length). Longer history? Store in cold storage, not hot path.

Non-Goals (What We Don’t Do)

[!NOTE] Revenue Guard prioritizes safety and speed over exhaustiveness:

No payment processing: Allocation only; payments are downstream.

No user auth: Turnstile proves “not a bot,” not “authenticated user.” Pair with your auth system.

No encryption in flight: Relies on Cloudflare’s edge encryption. Add app-layer encryption if PCI compliance requires it.

No audit trail to cold storage: Analytics Engine events expire after 30 days. Export to Sentry/DataDog/BigQuery if you need long-term audit compliance.

References

Tech Stack & Security: Under the Hood
Rate limit details: See repository README
Business impact & risk: Business Impact & ROI
Cloudflare Turnstile: https://developers.cloudflare.com/turnstile/
Bot Management: https://www.cloudflare.com/bm/