Intelligence & Bots: AI at the Edge

AI at the Edge, Not the Cloud

CF Messenger integrates AI not as an external API dependency, but as a first-class citizen running directly on Cloudflare’s GPUs in 150+ cities.

[!TIP] Unlike centralised APIs (OpenAI, Anthropic) that route data to specific regions, Workers AI runs locally. A user in Tokyo gets inference in Tokyo.

Persona-Driven Characters

How Bots Talk

Bots embody deterministic personas. Character definitions include vocabulary, punctuation quirks, and stylistic prompts hard-coded into the orchestrator to maintain the 2005-era “L33T speak” and aesthetic.

Workers AI Integration

Interactive contacts are powered by Llama 3.2 1B, running directly on Cloudflare’s GPUs. We utilise the instruction-tuned model for character-accurate responses with minimal latency.

Cost Controls & Circuit Breakers

To prevent “bill shock” from automated or runaway AI interactions, the system employs several layers of protection:

Daily Quota: 10,000 Worker AI interactions tracked in KV (eventually consistent).
Fallback Logic: When a quota is depleted, the bot emits a “Bot is sleeping” message and rejects new mentions.
Circuit Breaker: A global “Kill Switch” managed via Cloudflare KV can instantly disable all AI features across the zone.
Resource Budgeting: Workers AI billing is based on Neurons (an integrated unit of GPU output). 1,000 Neurons provides approximately 130 LLM responses for the 1B model.

Future Roadmap (Intelligence)

Stateful Memory: Migrating bot conversation history into dedicated Durable Objects for multi-turn context.
Quota Ledger: Moving the quota tracking from KV to a Durable Object to eliminate consistency races during high-traffic demos.
Advanced Personas: Upgrading to 8B models for deeper character nuance while maintaining cost efficiency via location-hint pinning.

Sources & References

Cloudflare Workers AI: Available Models & Inference Guide
Llama 3.2 Model Card: Meta Llama 3.2 Repository
Workers AI Pricing: Cloudflare Workers AI Docs
Durable Objects State Management: Durable Objects Documentation
Cost-Optimised LLM Inference: ML Model Optimisation Guide