Skip to content

Cloudflare AI Gateway

Overview

Cloudflare AI Gateway is a proxy layer that sits between applications and AI model providers, offering visibility, caching, rate limiting, and cost controls for AI traffic. Part of Cloudflare's broader AI infrastructure (alongside Workers AI, Vectorize, AI Search, and Agents SDK), it leverages Cloudflare's global edge network spanning 330+ cities. Integration requires minimal code changes — typically a single base-URL swap. The AI product landing page (ai.cloudflare.com) now redirects to workers.cloudflare.com/solutions/ai, now branded as "Cloudflare AI Cloud" — positioning the entire platform as AI-native infrastructure for data storage, inference, and deployment.

Formidability

Score: 8/10 (up from 7 — April 2026 Agents Week)

Cloudflare is a major infrastructure player with massive distribution and an existing developer base. AI Gateway benefits from being bundled into Cloudflare's free plan, making it trivially easy for existing customers to adopt. With the April 2026 "Agents Week" launch, Cloudflare now offers a unified inference layer serving 70+ models from 14+ providers via a single API — moving from passthrough proxy to direct model aggregation competitor. Combined with Agent Memory, enterprise MCP governance, and a full-stack agent platform (voice, email, browser, networking), Cloudflare is positioning as the default infrastructure for AI agents, not just a gateway. Internal dogfooding validates scale: 20M requests through AI Gateway and 241B tokens processed via Workers AI across 3,683 internal users.

Markets

  • Primary: Developers already on Cloudflare building AI-powered applications
  • Secondary: Enterprise teams seeking centralized AI traffic control, caching, and cost visibility
  • Geographic: Global (330+ cities edge network)

Products

  • AI Gateway — proxy with analytics, caching, rate limiting, logging, retry/fallback
  • AI Platform (inference layer) — unified API (AI.run()) serving 70+ models from 14+ providers (OpenAI, Anthropic, Alibaba, Google, Replicate, etc.), automatic failover, custom metadata for cost tracking, streaming resilience (launched April 2026)
  • Workers AI — serverless inference on Cloudflare's edge (Llama 3, Gemma 3, Whisper, TTS across 190+ locations)
  • Agent Memory — managed persistent memory for AI agents; semantic recall, memory classification (facts/events/instructions/tasks), Vectorize-backed retrieval (private beta, April 2026)
  • Agents SDK / Project Think — TypeScript framework for goal-driven agents with state management, SQL database per agent, multi-modal interfaces; "Project Think" is next-gen preview (April 2026)
  • R2 Object Storage — egress-free training data storage with multi-cloud GPU access
  • AI Search — search primitive for agents with dynamic instances, hybrid retrieval, relevance boosting
  • Vectorize — globally-replicated vector database for RAG workflows
  • Remote MCP Servers — OAuth-scoped endpoints for agent tool exposure; includes MCP playground; enterprise portals with Code Mode (94% token reduction) and DLP guardrails
  • Sandbox SDK — secure code execution, file management, and service exposure via public URLs
  • Browser Run — enhanced browser automation for agents with Live View, session recordings, 4x concurrency (April 2026)
  • Voice Agents — experimental real-time voice pipeline over WebSockets (April 2026)
  • Email Service — native email sending/processing for agents (public beta, April 2026)
  • Flagship — native feature flags with sub-millisecond evaluation (April 2026)
  • Cloudflare Mesh — private networking for agents/Workers with VPC integration (April 2026)
  • Durable Objects — stateful compute with MCPAgent transport layer
  • Containers — containerized workloads alongside Workers compute
  • Workflows — multi-step orchestration for durable task execution
  • RealtimeKit — real-time media (TURN/SFU) infrastructure
  • Unified Billing (2026) — pay for third-party model usage through Cloudflare invoice
  • Logpush — stream AI Gateway logs to external S3/SIEM ($0.05/M records)
  • Unweight — lossless LLM compression achieving 22% model size reduction (April 2026)

Pricing

Tier Cost AI Gateway Logs/month
Free (Workers Free) $0 100,000
Workers Paid $5/mo + usage 1,000,000
Enterprise Custom Custom

Core gateway features (caching, rate limiting, analytics) are free on all plans. Costs scale through Workers compute ($0.30/M requests, $0.02/M CPU-ms beyond included).

URLs to Monitor

URL Label Notes
https://developers.cloudflare.com/ai-gateway/ AI Gateway Docs Product documentation
https://developers.cloudflare.com/ai-gateway/configuration/ Configuration Gateway configuration options
https://developers.cloudflare.com/ai-gateway/providers/ Providers Supported AI providers (24 providers)
https://developers.cloudflare.com/ai-gateway/evaluations/ Evaluations Evaluation features
https://developers.cloudflare.com/workers-ai/models/ Workers AI Models Workers AI model catalog
https://www.cloudflare.com/pricing Pricing Pricing page
https://blog.cloudflare.com/tag/ai AI Blog AI-specific blog posts

Strategy

  • Bundle play: AI Gateway is free, driving adoption among Cloudflare's massive customer base
  • Model aggregation pivot: Inference layer now serves 70+ models from 14+ providers via unified API — shifting from passthrough proxy to model aggregator competing directly with OpenRouter
  • Unified billing (2026): Developers can pay for third-party model usage through their Cloudflare invoice, creating vendor consolidation
  • Edge-first: Caching AI responses at the edge to reduce latency and cost
  • Full-stack agent platform: Agents Week (April 2026) positioned Cloudflare as the default infrastructure for AI agents — Agents SDK, Agent Memory, Browser Run, Voice, Email, MCP governance, Mesh networking compose a complete agent runtime
  • Enterprise MCP governance: MCP Server Portals with Code Mode (94% token reduction), DLP guardrails, shadow MCP detection — enterprise-grade MCP management
  • Inference cost optimization: Unweight (22% LLM compression) + edge caching + Agent Memory (reduces context window bloat) all reduce inference costs
  • Platform integration: Tight coupling with Workers AI, Vectorize, AI Search, and broader Cloudflare developer platform
  • Built-in OAuth 2.1: Native OAuth provider and Managed OAuth for Access (RFC 9728) for agent authentication