Part 2 of 7 in the “APIM for AI Workloads” series
In Part 1 of this series, I made the case for why Azure API Management for AI workloads is the right control plane for governing AI traffic across an organization. This post gets practical: how do you actually secure access to your AI backends with APIM without creating a credential-management nightmare?
Security is where many AI projects cut corners, and understandably so. When you’re moving fast to prove value with a new model, authentication feels like overhead. But AI endpoints are expensive, and an unsecured Azure OpenAI endpoint is a real risk: anyone with the URL and key can start consuming tokens at your cost. At scale, that’s a significant financial and compliance exposure.
APIM addresses this with a three-layer security model. Let’s walk through each layer.
Azure API Management for AI Security: A Three-Layer Model
The authentication and authorization pattern in APIM is deliberately layered. Each layer answers a different question and operates independently, so a failure at any layer stops the request before it reaches the AI backend.

The three layers are:
- Subscription keys to identify and track API consumers.
- JWT validation to enforce fine-grained access control based on claims.
- Managed Identity to authenticate APIM to Azure OpenAI without storing credentials.
Each layer has a distinct role. Confusing them is a common mistake, so it’s worth being explicit about what each one does and does not do.
Layer 1: Subscription Keys
Subscription keys are APIM’s mechanism for identifying API consumers. When you create an API product in APIM and require a subscription, callers must include their key in the Ocp-Apim-Subscription-Key header. APIM validates the key, maps it to a subscriber, and lets the request proceed.
This is important for AI workloads specifically because subscription keys enable per-consumer token tracking. When you combine subscription key validation with the Token Metric policy we’ll cover in Part 4, you get usage data broken down by subscriber, which is the foundation of any internal cross-charging model.
Subscription keys answer the question: Who is calling? They don’t answer what the caller is allowed to do. For that, you need JWT validation.
Layer 2: JWT Validation and Claims-Based Authorization
The validate-jwt policy is where you enforce what a caller is permitted to do. It validates the JWT token in the Authorization header against your identity provider, and can inspect any claim in the token to make authorization decisions.
For Azure OpenAI specifically, this is where you control which teams or applications can access which model deployments. A team working on an internal chatbot should not be able to call a GPT-4o deployment reserved for a production workload. JWT claims let you enforce that boundary at the gateway layer, with no changes required in the calling application.
A typical policy checks the token signature against your Azure AD tenant’s OpenID Connect configuration, then validates that a required scope or role claim is present:
The failed-validation-httpcode=”401″ attribute ensures unauthenticated callers get a clean rejection before they ever reach the backend. You can also use failed-validation-error-message to return a specific error message, which helps consumers debug auth failures without exposing internal details.
For multi-provider setups where you’re routing to non-Azure backends like Mistral or Cohere, the same JWT policy applies. The claims model is provider-agnostic, which is one of the advantages of centralizing auth in APIM rather than handling it per-backend.
Layer 3: Managed Identity for Backend Authentication
Managed Identity is the most important security improvement you can make when setting up Azure API Management for AI. It replaces the pattern of storing an Azure OpenAI API key in APIM’s named values with a system-assigned or user-assigned Managed Identity that APIM uses to authenticate directly to Azure OpenAI via Azure AD.

The practical difference is significant. With API key authentication, you have a long-lived secret that needs to be stored, rotated, and kept out of source control. With Managed Identity, there is no secret. APIM requests a short-lived token from Azure AD at runtime, and Azure AD issues it based on the APIM instance’s identity. Nothing is stored. Nothing can leak.
The configuration is a single policy element in the inbound section: <authentication-managed-identity resource=”https://cognitiveservices.azure.com”/>. APIM handles the rest, automatically fetching and refreshing the token.
On the Azure OpenAI side, you grant the APIM instance’s Managed Identity the Cognitive Services User role on the Azure OpenAI resource. That’s the minimum required permission. You can scope it further to specific deployments if needed.
For organizations in regulated industries, such as healthcare, financial services, and government, Managed Identity is not optional. It satisfies Zero Trust authentication requirements and produces a full audit trail in Azure Monitor, tied to the APIM instance identity rather than a shared key.
Azure API Management for AI: Putting the Three Layers Together
In a production setup, all three layers run sequentially within the inbound policy pipeline. A request arrives with a subscription key and a JWT. APIM validates the key first (fast, no external call), then validates the JWT against Azure AD, then forwards the request to Azure OpenAI using its Managed Identity token. The AI backend never sees the caller’s JWT, and APIM never stores an API key.
The result is a clean separation of concerns:
- The calling application manages its own JWT (issued by Azure AD based on its own identity or the user’s identity).
- APIM enforces the authorization policy without the backend needing to know anything about it.
- The AI backend trusts only APIM’s Managed Identity, not arbitrary callers.
This is the architecture you want before you go to production with any AI workload that touches sensitive data or incurs meaningful cost.
What’s Next in This Series
Part 3 covers the Token Limit policy: how to enforce tokens-per-minute limits per consumer, configure throttling behavior, and handle the differences between the azure-openai-token-limit and llm-token-limit policy variants.
- Part 3: Token Limit policy — enforcing tokens-per-minute limits per consumer.
- Part 4: Token Metric policy — emitting usage data for observability and cross-charging.
- Part 5: Load balancing and circuit breaking across PTU and PAYG backends.
- Part 6: Semantic caching — reducing token consumption with similarity-based response reuse.
- Part 7: APIM as an MCP gateway for agentic AI workloads.





