Portkey AI Gateway
Connect, manage, and secure AI interactions across 1,600+ LLMs with centralised control and real-time monitoring

Portkey AI Gateway

Portkey AI Products

AI Gateway

Portkey AI Gateway

#PORTKEY-AI-GATEWAY
Our Price: Request a Quote

Overview:

Portkey AI Gateway is an enterprise-grade universal proxy that connects, manages, and secures AI interactions across 1,600+ LLMs via a single API. It eliminates the need for separate integrations per provider and adds a production-ready reliability layer – intelligent routing, automatic fallbacks, semantic caching, and virtual key management – all without changes to your application logic.

The gateway is OpenAI API-compatible, meaning most applications can be routed through Portkey with a two-line code change. Once in place, every LLM call passes through the gateway where routing policies, cost controls, caching rules, and observability are applied automatically and centrally.

Universal API access to 1,600+ LLMs across all major providers via a single endpoint.
Intelligent routing with automatic fallback chains and load balancing across provider keys.
Semantic caching reduces duplicate LLM calls and costs by up to 70%.
Virtual key management – store raw API keys securely, issue scoped keys to teams and services.
Automatic retries with exponential backoff and per-provider retry rules.
Request batching for large-scale workloads without impacting real-time performance.
OpenAI-compatible drop-in – deploy in minutes with no framework changes.
99.99% uptime SLA with multi-region deployment and active/active failover.

Universal API Access

Connect to 1,600+ LLMs across providers and modalities via a single unified API. No separate integration is required for each provider – switching models or adding a new provider is a configuration change, not an engineering project.

OpenAI, Anthropic, Google, Mistral, Cohere, Azure, AWS Bedrock, and 1,600+ more.
Single API key for all providers via the gateway.
Multimodal support including text, vision, audio, and embeddings.
OpenAI-compatible drop-in – change the base URL and API key, nothing else.

Virtual Key Management

Store raw LLM provider API keys securely in Portkey's encrypted vault. Issue scoped virtual keys to individual teams, applications, and services with defined model and budget permissions. Rotate or revoke access at any time without touching application code.

Encrypted key vault for all provider credentials.
Scoped virtual keys per team, service, or environment.
Instant revocation and key rotation.
Full usage audit log per virtual key.

Semantic Caching

Reduce redundant LLM calls by up to 70% with both exact-match and vector similarity caching. Semantically equivalent requests return cached responses instantly, cutting costs and latency without any loss in output quality.

Simple (exact-match) and semantic (vector similarity) caching modes.
Cache hit rate analytics in the dashboard.
Configurable TTL per cache rule.
Cost reduction of up to 70% on repetitive workloads.

Automatic Retries

Configure retry logic with exponential backoff across all providers. Transient 5xx errors, rate-limit responses, and timeout failures are retried automatically before surfacing to your application, eliminating most provider-side disruptions invisibly.

Configurable retry count per provider.
Exponential backoff with jitter.
Per-provider retry rule configuration.
Retry with alternative models or providers on persistent failure.

Request Batching

Scale large-volume LLM workloads using provider batch APIs or Portkey's custom batching layer without impacting real-time application performance. Batch jobs run asynchronously and results are delivered via webhook or polling.

Provider batch API support (OpenAI Batch, Anthropic Batch).
Custom batching logic for unsupported providers.
Asynchronous execution with no real-time performance impact.
Cost-efficient processing at high volume.

Intelligent Routing and Fallbacks

Dynamically route every LLM request to the optimal provider based on cost, latency, quality, or custom rules. Define multi-step fallback chains so that if a primary provider returns an error or exceeds latency thresholds, traffic automatically fails over to the next configured option without interrupting the user experience.

Automatic fallback chains with configurable priority order.
Latency, cost, and quality-based routing rules.
Zero-downtime provider switching.
Fallback alerts and routing decision logs.

Load Balancing Across Provider Keys

Distribute request volume across multiple API keys for the same provider to avoid rate-limit bottlenecks at high throughput. Load balancing is configured entirely at the gateway layer and requires no changes to application code.

Canary and A/B Deployments

Roll out new models gradually using canary routing. Send a configurable percentage of traffic to a new model or provider while the remainder continues on the existing configuration. Compare latency, cost, and eval scores side-by-side before fully cutting over.

Rate Limiting and Budget Controls

Enforce per-virtual-key and per-team rate limits (requests per minute and tokens per minute) at the gateway layer. Set soft alert thresholds and hard spending caps to prevent runaway costs across teams and applications.

High Availability and Uptime SLA

Portkey's managed cloud gateway operates across multiple regions with active/active failover and a 99.99% uptime SLA. Self-hosted deployments support active/active Kubernetes cluster configurations for equivalent availability on-premises or in a private cloud.

Portkey AI Gateway Specifications:

Table 1. AI Gateway Performance and Capacities
	Cloud (Managed)	Self-Hosted (Enterprise)
Request throughput	Up to 10,000 req/min	Unlimited (hardware-dependent)
Supported LLM providers	1,600+ models across OpenAI, Anthropic, Google, Mistral, Cohere, Azure, AWS Bedrock, and more
Uptime SLA	99.99% multi-region	Active/active cluster support
Cache cost reduction	Up to 70% on repetitive workloads (semantic + exact-match caching)
Fallback chain depth	Unlimited fallback steps configurable per routing config
Deployment options	Managed cloud (US, EU)	Kubernetes, Docker, private VPC
API compatibility	OpenAI API drop-in compatible. Change base URL and API key – no other code changes required.
Log retention	30 days (extendable)	Configurable (your storage)

Table 2. Integration and Compatibility
SDKs
Python and JavaScript/TypeScript SDKs. Full OpenAI SDK compatibility – route through Portkey with a two-line change.
Authentication
Virtual key scoping per team and service. SSO and SAML support on Enterprise tier.
Observability
Built-in request logging with 40+ metadata fields. Native integrations with Datadog, Grafana, Langfuse, and OpenTelemetry.
Agent Frameworks
LangChain, LlamaIndex, CrewAI, AutoGen, Vercel AI SDK, and all OpenAI-compatible frameworks.
Compliance
SOC 2 Type II, GDPR compliant. Zero data retention (ZDR) option available.

Table 3. Routing and Caching Capabilities
Routing Strategies
Single provider, fallback chain, load balanced (round-robin and weighted), canary (percentage split), and conditional (rule-based) routing.
Caching Modes
Exact-match (simple) caching and semantic (vector similarity) caching. Configurable TTL per rule. Cache analytics in the dashboard.
Retry Configuration
Configurable retry count (1–5), exponential backoff with jitter, per-provider rules, and cross-provider retry with fallback models.
Rate Limiting
Per-virtual-key and per-team limits on requests per minute and tokens per minute. Soft alert and hard block thresholds.
Budget Controls
Per-key and per-team spend caps with real-time cost tracking. Alerts via email, webhook, or dashboard notification.