Intelligence & Bots: AI at the Edge
AI at the Edge, Not the Cloud
Section titled “AI at the Edge, Not the Cloud”CF Messenger integrates AI not as an external API dependency, but as a first-class citizen running directly on Cloudflare’s GPUs in 150+ cities.
[!TIP] Unlike centralised APIs (OpenAI, Anthropic) that route data to specific regions, Workers AI runs locally. A user in Tokyo gets inference in Tokyo.
Persona-Driven Characters
Section titled “Persona-Driven Characters”How Bots Talk
Section titled “How Bots Talk”Bots embody deterministic personas. Character definitions include vocabulary, punctuation quirks, and stylistic prompts hard-coded into the orchestrator to maintain the 2005-era “L33T speak” and aesthetic.
Workers AI Integration
Section titled “Workers AI Integration”Interactive contacts are powered by Llama 3.2 1B, running directly on Cloudflare’s GPUs. We utilise the instruction-tuned model for character-accurate responses with minimal latency.
Cost Controls & Circuit Breakers
Section titled “Cost Controls & Circuit Breakers”To prevent “bill shock” from automated or runaway AI interactions, the system employs several layers of protection:
- Daily Quota: 10,000 Worker AI interactions tracked in KV (eventually consistent).
- Fallback Logic: When a quota is depleted, the bot emits a “Bot is sleeping” message and rejects new mentions.
- Circuit Breaker: A global “Kill Switch” managed via Cloudflare KV can instantly disable all AI features across the zone.
- Resource Budgeting: Workers AI billing is based on Neurons (an integrated unit of GPU output). 1,000 Neurons provides approximately 130 LLM responses for the 1B model.
Future Roadmap (Intelligence)
Section titled “Future Roadmap (Intelligence)”- Stateful Memory: Migrating bot conversation history into dedicated Durable Objects for multi-turn context.
- Quota Ledger: Moving the quota tracking from KV to a Durable Object to eliminate consistency races during high-traffic demos.
- Advanced Personas: Upgrading to 8B models for deeper character nuance while maintaining cost efficiency via location-hint pinning.
Sources & References
Section titled “Sources & References”- Cloudflare Workers AI: Available Models & Inference Guide
- Llama 3.2 Model Card: Meta Llama 3.2 Repository
- Workers AI Pricing: Cloudflare Workers AI Docs
- Durable Objects State Management: Durable Objects Documentation
- Cost-Optimised LLM Inference: ML Model Optimisation Guide