Portkey AI Gateway
Connect, manage, and secure AI interactions across 1,600+ LLMs with centralised control and real-time monitoring
Overview:
Portkey AI Gateway is an enterprise-grade universal proxy that connects, manages, and secures AI interactions across 1,600+ LLMs via a single API. It eliminates the need for separate integrations per provider and adds a production-ready reliability layer – intelligent routing, automatic fallbacks, semantic caching, and virtual key management – all without changes to your application logic.
The gateway is OpenAI API-compatible, meaning most applications can be routed through Portkey with a two-line code change. Once in place, every LLM call passes through the gateway where routing policies, cost controls, caching rules, and observability are applied automatically and centrally.
- Universal API access to 1,600+ LLMs across all major providers via a single endpoint.
- Intelligent routing with automatic fallback chains and load balancing across provider keys.
- Semantic caching reduces duplicate LLM calls and costs by up to 70%.
- Virtual key management – store raw API keys securely, issue scoped keys to teams and services.
- Automatic retries with exponential backoff and per-provider retry rules.
- Request batching for large-scale workloads without impacting real-time performance.
- OpenAI-compatible drop-in – deploy in minutes with no framework changes.
- 99.99% uptime SLA with multi-region deployment and active/active failover.
Universal API Access
Connect to 1,600+ LLMs across providers and modalities via a single unified API. No separate integration is required for each provider – switching models or adding a new provider is a configuration change, not an engineering project.
- OpenAI, Anthropic, Google, Mistral, Cohere, Azure, AWS Bedrock, and 1,600+ more.
- Single API key for all providers via the gateway.
- Multimodal support including text, vision, audio, and embeddings.
- OpenAI-compatible drop-in – change the base URL and API key, nothing else.
Virtual Key Management
Store raw LLM provider API keys securely in Portkey's encrypted vault. Issue scoped virtual keys to individual teams, applications, and services with defined model and budget permissions. Rotate or revoke access at any time without touching application code.
- Encrypted key vault for all provider credentials.
- Scoped virtual keys per team, service, or environment.
- Instant revocation and key rotation.
- Full usage audit log per virtual key.
Semantic Caching
Reduce redundant LLM calls by up to 70% with both exact-match and vector similarity caching. Semantically equivalent requests return cached responses instantly, cutting costs and latency without any loss in output quality.
- Simple (exact-match) and semantic (vector similarity) caching modes.
- Cache hit rate analytics in the dashboard.
- Configurable TTL per cache rule.
- Cost reduction of up to 70% on repetitive workloads.
Automatic Retries
Configure retry logic with exponential backoff across all providers. Transient 5xx errors, rate-limit responses, and timeout failures are retried automatically before surfacing to your application, eliminating most provider-side disruptions invisibly.
- Configurable retry count per provider.
- Exponential backoff with jitter.
- Per-provider retry rule configuration.
- Retry with alternative models or providers on persistent failure.
Request Batching
Scale large-volume LLM workloads using provider batch APIs or Portkey's custom batching layer without impacting real-time application performance. Batch jobs run asynchronously and results are delivered via webhook or polling.
- Provider batch API support (OpenAI Batch, Anthropic Batch).
- Custom batching logic for unsupported providers.
- Asynchronous execution with no real-time performance impact.
- Cost-efficient processing at high volume.
Intelligent Routing and Fallbacks
Dynamically route every LLM request to the optimal provider based on cost, latency, quality, or custom rules. Define multi-step fallback chains so that if a primary provider returns an error or exceeds latency thresholds, traffic automatically fails over to the next configured option without interrupting the user experience.
- Automatic fallback chains with configurable priority order.
- Latency, cost, and quality-based routing rules.
- Zero-downtime provider switching.
- Fallback alerts and routing decision logs.
Load Balancing Across Provider Keys
Distribute request volume across multiple API keys for the same provider to avoid rate-limit bottlenecks at high throughput. Load balancing is configured entirely at the gateway layer and requires no changes to application code.
Canary and A/B Deployments
Roll out new models gradually using canary routing. Send a configurable percentage of traffic to a new model or provider while the remainder continues on the existing configuration. Compare latency, cost, and eval scores side-by-side before fully cutting over.
Rate Limiting and Budget Controls
Enforce per-virtual-key and per-team rate limits (requests per minute and tokens per minute) at the gateway layer. Set soft alert thresholds and hard spending caps to prevent runaway costs across teams and applications.
High Availability and Uptime SLA
Portkey's managed cloud gateway operates across multiple regions with active/active failover and a 99.99% uptime SLA. Self-hosted deployments support active/active Kubernetes cluster configurations for equivalent availability on-premises or in a private cloud.
Portkey AI Gateway Specifications:
Table 1. AI Gateway Performance and Capacities |
||
|---|---|---|
| Cloud (Managed) | Self-Hosted (Enterprise) | |
| Request throughput | Up to 10,000 req/min | Unlimited (hardware-dependent) |
| Supported LLM providers | 1,600+ models across OpenAI, Anthropic, Google, Mistral, Cohere, Azure, AWS Bedrock, and more | |
| Uptime SLA | 99.99% multi-region | Active/active cluster support |
| Cache cost reduction | Up to 70% on repetitive workloads (semantic + exact-match caching) | |
| Fallback chain depth | Unlimited fallback steps configurable per routing config | |
| Deployment options | Managed cloud (US, EU) | Kubernetes, Docker, private VPC |
| API compatibility | OpenAI API drop-in compatible. Change base URL and API key – no other code changes required. | |
| Log retention | 30 days (extendable) | Configurable (your storage) |
| Table 2. Integration and Compatibility |
|---|
| SDKs |
| Python and JavaScript/TypeScript SDKs. Full OpenAI SDK compatibility – route through Portkey with a two-line change. |
| Authentication |
| Virtual key scoping per team and service. SSO and SAML support on Enterprise tier. |
| Observability |
| Built-in request logging with 40+ metadata fields. Native integrations with Datadog, Grafana, Langfuse, and OpenTelemetry. |
| Agent Frameworks |
| LangChain, LlamaIndex, CrewAI, AutoGen, Vercel AI SDK, and all OpenAI-compatible frameworks. |
| Compliance |
| SOC 2 Type II, GDPR compliant. Zero data retention (ZDR) option available. |
| Table 3. Routing and Caching Capabilities |
|---|
| Routing Strategies |
| Single provider, fallback chain, load balanced (round-robin and weighted), canary (percentage split), and conditional (rule-based) routing. |
| Caching Modes |
| Exact-match (simple) caching and semantic (vector similarity) caching. Configurable TTL per rule. Cache analytics in the dashboard. |
| Retry Configuration |
| Configurable retry count (1–5), exponential backoff with jitter, per-provider rules, and cross-provider retry with fallback models. |
| Rate Limiting |
| Per-virtual-key and per-team limits on requests per minute and tokens per minute. Soft alert and hard block thresholds. |
| Budget Controls |
| Per-key and per-team spend caps with real-time cost tracking. Alerts via email, webhook, or dashboard notification. |
Documentation:
View the Portkey AI Gateway Documentation (External Link).
View the Portkey Gateway Routing and Config Reference (External Link).
View the Portkey Virtual Key Management Guide (External Link).
