Why Not Just Use a VPN?

The most common question we hear from enterprise architects evaluating this reference architecture is: “Why can’t we just spin up a VPN?”

The short answer: you can, but it will take you months and cost you hundreds of thousands of pounds. The longer answer requires understanding why VPNs were designed for a different era and why the ZTNA market has moved beyond them.

The VPN Problem

VPNs were designed in the 1990s to extend a trusted corporate network to a remote device. The model assumes:

The device is managed — you control the OS, the firewall, the disk encryption.
The user is an employee — you have weeks to ship hardware and configure it.
The network is the perimeter — once connected, the user is “inside” and trusted.

None of these assumptions hold in our M&A scenario. We have 200 contractors using personal laptops who need access tomorrow. A VPN would require:

Shipping 200 managed laptops (4–6 weeks lead time)
Installing and configuring a VPN client on each device
Provisioning VPN concentrator hardware to handle the load
Merging identity directories (Active Directory forest trusts)

Total elapsed time: 3–6 months. Total cost: £300k–£500k+.

Even if you could accelerate the hardware, VPNs grant network-level access. Once a contractor connects, they can see the entire subnet. Lateral movement is trivial. This is the fundamental architectural flaw that NIST’s Zero Trust Architecture (SP 800-207) was designed to address: VPNs authenticate users to a network, not to an application.

The Traditional Vendor Playbook

Cisco (AnyConnect + ISE + Duo)

Cisco’s approach layers multiple products:

Cisco Secure Client (AnyConnect) for VPN tunnelling — requires a client agent installed on every device
Identity Services Engine (ISE) for network access control
Duo (acquired 2018) for MFA and device trust
Umbrella for DNS-layer security

For our scenario: AnyConnect requires an agent on every device — impossible for unmanaged BYOD. ISE is designed for on-premises network segmentation, not cloud-native applications. You would need to deploy and integrate at least four separate products, each with its own management console. Cisco has no native AI inference platform, so governing “Shadow AI” would require yet another vendor.

Microsoft (Entra Private Access + Conditional Access)

Microsoft’s answer to ZTNA is Entra Private Access (formerly Azure AD Application Proxy), part of the Global Secure Access platform:

Entra Private Access replaces VPN for private app access
Conditional Access enforces device compliance and MFA
Microsoft Defender for Cloud Apps provides CASB and DLP

For our scenario: Entra Private Access requires the Global Secure Access client installed on end-user devices, and is tightly coupled to the Microsoft ecosystem — it works best when you already have Intune-managed devices and Entra ID as your sole identity provider. For BYOD contractors from an acquired company (who may use Google Workspace), the integration path is complex. Microsoft has no equivalent to Workers or Durable Objects — you still need separate infrastructure (Azure VMs, App Service) to host the application itself. AI governance would require Azure OpenAI Service, a separate billing and management plane.

Palo Alto Networks (Prisma Access + Prisma Cloud)

Palo Alto’s SASE offering, Prisma Access, provides:

GlobalProtect agent for device connectivity
Cloud-delivered firewalls for traffic inspection
Prisma Cloud for workload protection

For our scenario: GlobalProtect is agent-based — the same BYOD problem as Cisco. Prisma Access does offer a clientless VPN option that acts as a reverse proxy, but it is designed for simple web applications, not for rendering full internal SPAs with isolation. Palo Alto has no serverless compute platform; you need separate infrastructure for application hosting. Their AI Access Security offering focuses on controlling access to third-party AI tools, not on hosting your own governed AI platform.

Zscaler (ZPA + ZIA)

Zscaler is the closest competitor in the pure ZTNA space:

Zscaler Private Access (ZPA) provides identity-aware application access
Zscaler Internet Access (ZIA) secures outbound traffic
Zscaler Browser Isolation renders applications remotely

For our scenario: ZPA with agentless Browser Isolation is a credible alternative for the access layer. However, Zscaler is a security overlay — it secures access to applications but does not host them. You still need AWS/Azure/GCP infrastructure for the frontend, backend, and database. Zscaler has no AI inference platform, no serverless compute, and no managed RAG pipeline. The “single control plane” benefit disappears when you need three vendors (Zscaler + cloud provider + AI platform) instead of one.

The Cloudflare Difference

Cloudflare is unique in that it combines security, compute, and AI into a single global platform. Sources for each capability are linked in the footnotes below.

Capability	Cisco	Microsoft	Palo Alto	Zscaler	Cloudflare
ZTNA (Identity-Aware Access)	✅	✅	✅	✅	✅
Clientless / Agentless Access	❌	⚠️ Limited	⚠️ Limited	✅	✅
Remote Browser Isolation	❌	❌	⚠️ Basic	✅	✅ (NVR)
Serverless App Hosting	❌	⚠️ Azure	❌	❌	✅ (Workers)
Edge-Native State	❌	❌	❌	❌	✅ (DO / D1 / KV)
Native AI Inference	❌	⚠️ Azure OpenAI	❌	❌	✅ (Workers AI)
AI Gateway + DLP	❌	❌	⚠️ Partial	❌	✅
Managed RAG Pipeline	❌	❌	❌	❌	✅ (AI Search)
Zero Origin Servers	❌	❌	❌	❌	✅
Day 1 CapEx	£300k+	£100k+	£250k+	£50k+	£0

⚠️ = Capability exists but requires significant additional configuration, licensing, or a separate product/platform.

Cloudflare sources: Access · Clientless RBI · Workers · Durable Objects · D1 · KV · Workers AI · AI Gateway · AI Search

The Key Insight

Traditional vendors — Cisco, Microsoft, Palo Alto — treat security and compute as separate concerns. You buy a security overlay and then bolt it onto your existing infrastructure. This means:

More vendors = more integration complexity
More infrastructure = more attack surface
More licensing = higher cost

Cloudflare collapses the stack. The same platform that authenticates users also hosts the application, runs the AI model, and enforces DLP policy. There are no origin servers, no VPN hardware, and no cloud VMs to manage.

AI Platform: Why Not AWS Bedrock or Azure OpenAI?

The Vera Chat helpdesk is not a toy — it is a production AI system that must enforce DLP, ground responses in policy documents, and log every interaction for audit. Building this on a traditional cloud provider requires stitching together multiple services. Cloudflare provides all three layers — inference, governance, and retrieval — as native platform primitives.

Workers AI (AI Chat / Inference)

Workers AI provides serverless LLM inference at the edge. In our architecture, we use @cf/meta/llama-3.3-70b-instruct-fp8-fast for the main chatbot and @cf/meta/llama-3.2-1b-instruct for the lightweight DLP “Judge.”

Capability	AWS Bedrock	Azure OpenAI	Google Vertex AI	Cloudflare Workers AI
Serverless inference	✅	✅	✅	✅
Open-source models (Llama, Mistral)	✅	❌ (OpenAI only)	✅	✅
Edge-local inference (low latency)	❌ (Region)	❌ (Region)	❌ (Region)	✅ (330+ cities)
Co-located with app compute	⚠️ (Lambda)	⚠️ (Functions)	⚠️ (Cloud Run)	✅ (Same Worker)
Integrated DLP gateway	❌	❌	❌	✅ (AI Gateway)
No cold start for inference	⚠️	⚠️	⚠️	✅
Pay-per-token (no provisioned capacity)	✅	⚠️ (PTU tiers)	✅	✅

Sources: AWS Bedrock supported models · Azure OpenAI models · Azure OpenAI PTU pricing · Google Vertex AI models · Cloudflare Workers AI models · Cloudflare network map (330+ cities)

The key difference: On AWS or Azure, the AI model runs in a specific region. The user’s request travels from the edge to the region, hits a Lambda/Function, calls Bedrock/OpenAI, returns through the function, and back to the edge. On Cloudflare, the Worker and the AI model run on the same machine at the nearest edge location. There is no inter-service network hop. This matters for real-time chat — every round-trip saved is latency the user feels.

Additionally, running the “Judge” pattern (a cheap 1B model screening prompts before the expensive 70B model) is trivial on Workers AI because both models are invoked via the same env.AI.run() binding. On AWS, you would need two separate Bedrock InvokeModel calls with different model IDs, each billed independently and routed through separate endpoints.

AI Gateway (Governance & Observability)

AI Gateway is the control plane for all AI traffic. It sits between the Worker and the model, providing DLP scanning, rate limiting, cost analytics, and audit logging — without any application code changes.

Capability	AWS Bedrock Guardrails	Azure Content Safety	LangSmith	Helicone	Portkey	Cloudflare AI Gateway
Prompt/Response DLP	✅	✅	❌	❌	❌	✅
Rate limiting	⚠️ (API Gateway)	⚠️ (APIM)	❌	✅	✅	✅
Cost analytics (per-request)	⚠️ (CloudWatch)	⚠️ (Monitor)	✅	✅	✅	✅
Request/response logging	✅	✅	✅	✅	✅	✅
Caching (semantic dedup)	❌	❌	❌	❌	✅	✅
Zero-code integration	❌ (SDK)	❌ (SDK)	❌ (SDK)	❌ (Proxy)	❌ (Proxy)	✅ (Binding)
Same platform as compute	⚠️	⚠️	❌	❌	❌	✅

Sources: AWS Bedrock Guardrails · Azure AI Content Safety · LangSmith docs · Helicone docs · Portkey docs · Cloudflare AI Gateway DLP · AI Gateway caching

The key difference: Third-party observability tools like LangSmith or Helicone are excellent for monitoring, but they are observability-only — they cannot enforce DLP or block unsafe prompts in real-time. AWS Bedrock Guardrails and Azure Content Safety can block content, but they are separate services that require SDK integration and add latency. Cloudflare AI Gateway is configured as a wrangler.jsonc binding — every env.AI.run() call is automatically routed through it with zero code changes. DLP scanning, rate limiting, and logging are all enforced at the platform level.

AI Search (Managed RAG)

AI Search provides a fully managed Retrieval-Augmented Generation pipeline. You upload documents to R2, and AI Search handles chunking, embedding, indexing, and retrieval — including automatic query rewriting.

Capability	AWS Bedrock Knowledge Bases	Azure AI Search + OpenAI	Pinecone + LangChain	Weaviate	Cloudflare AI Search
Managed chunking & embedding	✅	⚠️ (Manual pipeline)	❌ (DIY)	⚠️	✅
Automatic query rewriting	✅	⚠️ (Semantic Ranker)	❌ (DIY)	❌	✅
Source attribution	✅	✅	❌ (DIY)	✅	✅
Integrated with app compute	⚠️ (Lambda)	⚠️ (Functions)	❌	❌	✅ (Same Worker)
Document storage included	✅ (S3)	✅ (Blob)	❌	❌	✅ (R2)
No separate vector DB billing	✅	❌ (Cognitive Search)	❌ (Pinecone)	❌ (Weaviate Cloud)	✅
Edge-local retrieval	❌ (Region)	❌ (Region)	❌ (Region)	❌ (Region)	✅

Sources: AWS Bedrock Knowledge Bases · Azure AI Search (formerly Cognitive Search) · Azure AI Search RAG tutorial · Pinecone docs · Weaviate docs · Cloudflare AI Search · Cloudflare R2

The key difference: Building RAG on AWS requires wiring together S3, Bedrock Knowledge Bases (or a custom Lambda pipeline), and potentially OpenSearch for hybrid search. On Azure, you need Blob Storage, Azure AI Search, and Azure OpenAI — three separate services with three billing models. With Pinecone or Weaviate, you manage the vector database yourself and write the chunking/embedding pipeline from scratch using LangChain or LlamaIndex.

Cloudflare AI Search reduces this to a single API call: env.AI.autorag().search(). Documents in R2 are automatically chunked, embedded with bge-base-en-v1.5, and indexed. Query rewriting is a boolean flag. The retrieval happens at the edge, co-located with the Worker that processes the result — no cross-region network hops.

Summary: Why Cloudflare for This Use Case

For the specific M&A scenario — onboarding 200 untrusted BYOD users in 48 hours — Cloudflare is the only vendor that can deliver the complete solution from a single platform:

No agents required — Clientless Access + RBI means contractors use any browser.
No infrastructure to provision — Workers, Pages, and D1 eliminate the need for cloud VMs.
No separate AI vendor — Workers AI + AI Gateway + AI Search provide a governed AI platform out of the box.
No VPN — There is no network to connect to. Every request is authenticated at the edge, per-application, per-request.
No multi-vendor AI stack — Inference, governance, and retrieval all run on the same platform, in the same Worker, at the same edge location.

The result: 48 hours from zero to full production access, at near-zero CapEx.