Azure API Management Token Limit Policy: Controlling AI Token Consumption Per Consumer

Part 3 of 7 in the “APIM for AI Workloads” series

The Azure API Management token limit policy is one of the most direct cost control levers you have for AI workloads. In Part 1 of this series, I argued that token consumption is invisible without the right instrumentation. The token limit policy is the enforcement side of that equation: once you know how many tokens consumers are using, you set boundaries so that no single consumer can exhaust your model capacity or run up an unexpected bill.

This post covers how the policy works, which counter-key strategy to choose for your workload, how to size your tokens-per-minute (TPM) limits, and the difference between the Azure OpenAI-specific policy and the generic LLM variant for non-Microsoft backends.

Azure API Management Token Limit Policy: How It Works

The azure-openai-token-limit policy sits in the inbound section of your APIM policy pipeline. Before any request reaches the AI backend, APIM checks a sliding window counter keyed to the value you specify. If the caller is within their TPM budget, the request passes through. If they’ve exceeded it, APIM returns a 429 Too Many Requests response with a Retry-After header, and the backend never sees the request.

This is important: the throttling happens at the gateway, not at the Azure OpenAI endpoint. That means you’re not paying for rejected requests, and your model deployment is protected from saturation by a single runaway consumer.

Azure API Management token limit policy funnel throttling AI requests with 429 response and Retry-After header
Diagram 1: The token limit policy acts as a funnel in the APIM inbound pipeline. Requests within the TPM budget pass through to the AI backend. Requests exceeding the limit receive a 429 status code with a Retry-After header before the backend is even reached.

The policy has two variants. The azure-openai-token-limit policy is purpose-built for Azure OpenAI and Microsoft Foundry endpoints, and uses the actual token counts returned in the API response. The llm-token-limit policy is the generic variant for any LLM backend, including Mistral, Cohere, and others. Both share the same attribute model, so the configuration patterns below apply to either.

Choosing a counter-key for Azure API Management Token Limiting

The counter-key attribute is the most important decision in configuring the token limit policy. It determines the scope of the limit: who shares a TPM bucket, and who gets their own.

Azure API Management token limit policy counter-key strategies per subscription IP address and JWT claim with TPM sizing table
Diagram 2: Three counter-key strategies and TPM sizing guidance by workload type. The right scope depends on whether you are separating teams, protecting a public endpoint, or enforcing per-user limits in a multi-tenant application.

The three main strategies are:

Per IP address: @(context.Request.IpAddress). Better suited to public-facing endpoints or developer portals where you don’t have a subscription model. It’s a blunt instrument — NAT and shared egress can mean multiple users share a counter — but it’s effective for abuse prevention and trial access scenarios.

Per JWT claim or custom header: @(context.Request.Headers.GetValueOrDefault(“x-user-id”,””)). The most flexible option. If your application passes a user identifier in a header or JWT claim, you can scope limits to the individual user. This is the right approach for multi-tenant applications where each end user should have their own token budget, independent of which subscription they’re calling through.

Sizing Your TPM Limits

TPM limits are context-dependent, but a few principles apply across most workloads.

Start by profiling your actual token usage in a staging environment before setting production limits. The remaining-tokens-variable-name attribute exposes the remaining token budget as a policy variable, which you can log via the Token Metric policy to build a usage baseline before enforcing hard limits.

For the estimate-prompt-tokens attribute: set it to false in production. When set to true, APIM estimates prompt tokens before the response is returned, enabling earlier throttling but reducing accuracy. In practice, counting actual tokens from the response is more reliable and avoids throttling requests that would have been within budget.

A common mistake is setting a single global TPM limit too low, which throttles all consumers the moment a batch job runs on any team. The better pattern is tiered limits by API product: a Developer product with a low TPM ceiling, a Standard product for normal workloads, and an Unlimited product for production pipelines that need burst capacity.

Handling 429 Responses in Calling Applications

Any application calling an APIM-fronted AI endpoint needs to handle 429 responses gracefully. APIM returns a Retry-After header indicating how many seconds until the token window resets. Well-behaved clients respect this header and back off rather than retrying immediately.

For agentic workloads with multiple pipeline steps, a 429 response midway through can leave the agent in an inconsistent state. The recommended pattern is to expose the remaining-tokens-variable-name value in a response header so the calling application can monitor its own budget and slow down proactively, rather than waiting for a hard rejection.

The Azure OpenAI token limit policy documentation covers the full attribute reference, including tokens-per-minute, counter-key, estimate-prompt-tokens, and remaining-tokens-variable-name. The llm-token-limit variant has the same interface for non-Azure backends.

What’s Next in This Azure API Management for AI Series

Part 4 covers the Token Metric policy: how to emit token usage data to Application Insights broken down by consumer dimensions, and how to use that data for internal cross-charging and spend dashboards.

Azure API Management for AI: Securing Your AI APIs with Authentication and Authorization

Part 2 of 7 in the “APIM for AI Workloads” series

In Part 1 of this series, I made the case for why Azure API Management for AI workloads is the right control plane for governing AI traffic across an organization. This post gets practical: how do you actually secure access to your AI backends with APIM without creating a credential-management nightmare?

Security is where many AI projects cut corners, and understandably so. When you’re moving fast to prove value with a new model, authentication feels like overhead. But AI endpoints are expensive, and an unsecured Azure OpenAI endpoint is a real risk: anyone with the URL and key can start consuming tokens at your cost. At scale, that’s a significant financial and compliance exposure.

APIM addresses this with a three-layer security model. Let’s walk through each layer.

Azure API Management for AI Security: A Three-Layer Model

The authentication and authorization pattern in APIM is deliberately layered. Each layer answers a different question and operates independently, so a failure at any layer stops the request before it reaches the AI backend.

Azure API Management for AI three-layer authentication flow showing subscription key, JWT validation and Managed Identity policy pipeline
Diagram 1: Three-layer auth in APIM for AI workloads. Layer 1 identifies the caller via subscription key. JWT validation in Layer 2 then determines what they’re permitted to do. Finally, Layer 3 authenticates APIM itself to the AI backend via Managed Identity.

The three layers are:

  • Subscription keys to identify and track API consumers.
  • JWT validation to enforce fine-grained access control based on claims.
  • Managed Identity to authenticate APIM to Azure OpenAI without storing credentials.

Each layer has a distinct role. Confusing them is a common mistake, so it’s worth being explicit about what each one does and does not do.

Layer 1: Subscription Keys

Subscription keys are APIM’s mechanism for identifying API consumers. When you create an API product in APIM and require a subscription, callers must include their key in the Ocp-Apim-Subscription-Key header. APIM validates the key, maps it to a subscriber, and lets the request proceed.

This is important for AI workloads specifically because subscription keys enable per-consumer token tracking. When you combine subscription key validation with the Token Metric policy we’ll cover in Part 4, you get usage data broken down by subscriber, which is the foundation of any internal cross-charging model.

Subscription keys answer the question: Who is calling? They don’t answer what the caller is allowed to do. For that, you need JWT validation.

Layer 2: JWT Validation and Claims-Based Authorization

The validate-jwt policy is where you enforce what a caller is permitted to do. It validates the JWT token in the Authorization header against your identity provider, and can inspect any claim in the token to make authorization decisions.

For Azure OpenAI specifically, this is where you control which teams or applications can access which model deployments. A team working on an internal chatbot should not be able to call a GPT-4o deployment reserved for a production workload. JWT claims let you enforce that boundary at the gateway layer, with no changes required in the calling application.

A typical policy checks the token signature against your Azure AD tenant’s OpenID Connect configuration, then validates that a required scope or role claim is present:

The failed-validation-httpcode=”401″ attribute ensures unauthenticated callers get a clean rejection before they ever reach the backend. You can also use failed-validation-error-message to return a specific error message, which helps consumers debug auth failures without exposing internal details.

For multi-provider setups where you’re routing to non-Azure backends like Mistral or Cohere, the same JWT policy applies. The claims model is provider-agnostic, which is one of the advantages of centralizing auth in APIM rather than handling it per-backend.

Layer 3: Managed Identity for Backend Authentication

Managed Identity is the most important security improvement you can make when setting up Azure API Management for AI. It replaces the pattern of storing an Azure OpenAI API key in APIM’s named values with a system-assigned or user-assigned Managed Identity that APIM uses to authenticate directly to Azure OpenAI via Azure AD.

Azure API Management for AI comparing API key authentication risks versus Managed Identity benefits for Azure OpenAI backend access
Diagram 2: API key authentication (left) vs. Managed Identity (right). The key difference is that Managed Identity requires no stored credentials anywhere in your configuration.

The practical difference is significant. With API key authentication, you have a long-lived secret that needs to be stored, rotated, and kept out of source control. With Managed Identity, there is no secret. APIM requests a short-lived token from Azure AD at runtime, and Azure AD issues it based on the APIM instance’s identity. Nothing is stored. Nothing can leak.

The configuration is a single policy element in the inbound section: <authentication-managed-identity resource=”https://cognitiveservices.azure.com”/&gt;. APIM handles the rest, automatically fetching and refreshing the token.

On the Azure OpenAI side, you grant the APIM instance’s Managed Identity the Cognitive Services User role on the Azure OpenAI resource. That’s the minimum required permission. You can scope it further to specific deployments if needed.

For organizations in regulated industries, such as healthcare, financial services, and government, Managed Identity is not optional. It satisfies Zero Trust authentication requirements and produces a full audit trail in Azure Monitor, tied to the APIM instance identity rather than a shared key.

Azure API Management for AI: Putting the Three Layers Together

In a production setup, all three layers run sequentially within the inbound policy pipeline. A request arrives with a subscription key and a JWT. APIM validates the key first (fast, no external call), then validates the JWT against Azure AD, then forwards the request to Azure OpenAI using its Managed Identity token. The AI backend never sees the caller’s JWT, and APIM never stores an API key.

The result is a clean separation of concerns:

  • The calling application manages its own JWT (issued by Azure AD based on its own identity or the user’s identity).
  • APIM enforces the authorization policy without the backend needing to know anything about it.
  • The AI backend trusts only APIM’s Managed Identity, not arbitrary callers.

This is the architecture you want before you go to production with any AI workload that touches sensitive data or incurs meaningful cost.

What’s Next in This Series

Part 3 covers the Token Limit policy: how to enforce tokens-per-minute limits per consumer, configure throttling behavior, and handle the differences between the azure-openai-token-limit and llm-token-limit policy variants.

Azure API Management for AI: Why Your APIs Need a Gateway

Part 1 of 7 in the “APIM for AI Workloads” series

Over the past year, I’ve been doing a lot of work with integration services, including Azure API Management and, recently, also on AI adoption: evaluating models, designing agentic architectures, and figuring out how to govern AI consumption across the organization responsibly. One thing that keeps coming up in those conversations is a question that sounds almost too basic to ask: Who is keeping track of what we’re spending on tokens?

The answer, more often than not, is nobody.

That’s the problem this series is about. AI APIs are fundamentally different from the REST APIs we’ve been managing for the past decade, and the differences matter operationally. Before we dive into the mechanics of Azure API Management policies, load balancing, and semantic caching in subsequent posts, I want to make the case for a gateway layer in front of your AI services. Before we dive into the mechanics of Azure API Management for AI workloads, policies, load balancing, and semantic caching, I want to make the case for why you need a gateway layer in front of your AI services.

Tokens Are Not Requests

Traditional API management was built around a relatively simple model: count the requests, enforce rate limits, log the traffic, and call it done. One call in, one response out. The cost model was predictable.

AI APIs broke that model completely.

When you call an Azure OpenAI endpoint, you’re not paying per request. You’re paying per token. And a token count is invisible at the API gateway layer unless you specifically instrument for it. A single call from a conversational agent might consume 500 tokens. A call from a poorly-optimized batch process might consume 50,000. Both look the same at the HTTP level: one POST, one 200 OK.

This creates a blind spot that grows dangerously as AI adoption scales across an organization. Teams start building intelligent apps: conversational agents, personalized content generators, voice assistants, copilots, and each one is independently calling AI backend services. Nobody has a view across the whole estate of what’s being consumed, by whom, and at what cost.

The diagram below shows what this looks like in practice: multiple application types hitting multiple AI providers, with token-based pricing models sitting underneath.

Azure API Management control plane between intelligent apps and AI providers showing PTU and PAYG token billing
Diagram 1: Intelligent applications on the left, AI service providers on the right, with both PTU and PAYG billing models underneath. Without a control plane in the middle, you’re flying blind.

The Three Problems Azure API Management for AI Solves

Azure API Management acts as the centralized control plane between your intelligent applications and your AI backends. It addresses three distinct categories of problems.

Performance optimization: AI model endpoints have throughput limits. Azure OpenAI Provisioned Throughput Units (PTU) give you reserved capacity at a fixed price, but cap out at a hard ceiling. Pay-as-you-go (PAYG) endpoints scale elastically but incur higher per-token costs at high volumes. Without a gateway layer, individual applications can’t know whether PTU capacity is available or saturated. A gateway can make that routing decision automatically, serving from PTU when it has headroom, falling back to PAYG when it doesn’t. That’s a meaningful cost optimization with no changes required to the calling applications.

Cost control: Tokens consumed by one team are costs borne by another team’s budget if you’re centralizing AI spend, which most organizations will do, at least initially. Without per-consumer visibility into token usage, internal cross-charging and showback are impossible. APIM’s token metric policies make this tractable by emitting token consumption data broken down by dimensions such as User ID, Subscription ID, or API product, all of which feed into Application Insights for dashboarding and alerting.

Data security: Routing AI traffic through a managed gateway gives you a single enforcement point for authentication, authorization, and policy. You can validate JWT claims, require subscription keys from API consumers, use Managed Identity to authenticate to Azure OpenAI without exposing credentials, and ensure traffic never leaves your controlled perimeter. Without a gateway, every team builds its own auth story, or more commonly, skips it.

PTU vs. PAYG: Why the Billing Model Shapes Your Architecture

Before we go further, it’s worth spending a moment on the two Azure OpenAI billing models, because they have direct architectural implications.

Provisioned Throughput Units (PTU) give you reserved capacity on a model. You pay a fixed hourly rate regardless of how many tokens you actually consume. The benefits are predictable costs and guaranteed throughput. The risk is waste if your utilization is low, and hard throttling if you exceed the provisioned limit.

Pay-as-you-go (PAYG) charges per token consumed. No upfront commitment, no capacity ceiling, but costs scale linearly with usage and can surprise you if consumption spikes.

Most production AI deployments end up using both: PTU for baseline load, where utilization is predictable, and PAYG as an overflow layer. This makes a load balancer with circuit breaking essential, which we’ll cover in Part 5 of this series.

The same logic applies beyond Azure OpenAI. APIM now supports generic LLM backends via the llm-* policy family, which means you can manage traffic to Mistral, Cohere, LLaMA, and other providers through the same control plane. The diagram below shows this architecture: APIM in the center, with load balancing across PTU and PAYG instances, token metrics flowing to Application Insights, and the full provider landscape behind it.

Azure API Management AI control plane with token limit, token metric, load balancing, semantic caching and circuit breaker policies across PTU and PAYG backends
Azure API Management as the centralized AI control plane, with performance, cost, and security governance across multiple providers and billing models.

What This Looks Like in Practice

Let me make this concrete with a scenario I’ve seen play out multiple times.

An organization deploys its first Azure OpenAI service for a conversational agent. A few months later, a second team wants to use AI for content generation. Then a third team builds an internal copilot. Each team provisions its own Azure OpenAI resource, authenticates directly, and manages its own rate limiting. There’s no visibility into combined spend. No shared capacity optimization. No centralized audit trail.

This is the point where someone in finance asks a question that nobody can answer: “How much are we spending on AI, and which team is spending what?”

Centralizing AI traffic through APIM is how you get out of that situation before it becomes a problem. The policy-based approach means you can add governance without changing anything in the calling applications. They call the APIM endpoint, APIM handles the rest.

Azure API Management for AI Workloads: What’s Coming in This Series

The next six posts will go deep on the specific capabilities that make APIM a serious AI control plane:

  • Part 2 covers authentication and authorization: JWT validation, Managed Identity, and subscription keys.
  • Part 3 covers the Token Limit policy: enforcing tokens-per-minute limits per consumer.
  • Part 4 covers the Token Metric policy: emitting usage data for observability and cross-charging.
  • Part 5 covers load balancing and circuit breaking across PTU and PAYG backends.
  • Part 6 covers semantic caching: reducing token consumption by serving cached responses for similar prompts.
  • Part 7 covers APIM’s emerging role as an MCP gateway for agentic AI workloads.

Each post will include the relevant policy XML, real-world sizing guidance, and the architectural decisions behind the patterns.

If you’re building AI-powered applications at scale and you’re not yet routing that traffic through a gateway, the rest of this series is for you.

Build an AI Tech News Aggregator: Azure Functions & Claude

There’s a lot of noise on the internet. Reddit, Hacker News, tech blogs, keeping up with what actually matters in enterprise software is a full-time job. So I built a fully automated system that does it for me, runs in the cloud, is powered by AI, and was deployed end-to-end in less than two hours using Claude Code.

Here’s how.

What We Built (What Claude did mostly)

A C# Azure Function that runs every hour and:

  1. Fetches posts from configurable Reddit subreddits and Hacker News
  2. Filters for recency only posts from the last 7 days
  3. Deduplicates across runs never evaluates the same URL twice
  4. Applies an AI editorial filter Claude decides what’s genuinely newsworthy
  5. Writes curated results to Azure Blob Storage as timestamped JSON

The output is clean, structured JSON ready to feed into a newsletter, dashboard, or notification system.

The Architecture

The system has three layers: data collectionAI filtering, and persistence.

Reddit RSS feeds ──┐

                   ├─► Aggregator Function ─► Claude AI Filter ─► Blob Storage

HN Firebase API ───┘         │

                              └─► State Store (seen URLs)

Tech Stack

ConcernChoice
RuntimeAzure Functions v4, .NET 8 isolated worker
Reddit dataPublic Atom/RSS feed (r/{sub}/top.rss)
HN dataFirebase REST API
AI filteringAnthropic Claude (claude-opus-4-6) via raw HttpClient
StorageAzure Blob Storage
ScheduleNCRONTAB timer trigger

Interesting Engineering Decisions

Reddit: RSS over JSON API

The Reddit JSON API (/top.json) started returning 403s without authentication. Rather than deal with OAuth, we switched to Reddit’s public Atom/RSS feed (no credentials required) and parsed it with System.Xml.Linq in a handful of lines. Simple wins.

Claude as an Editorial Filter

Instead of writing brittle keyword heuristics to judge whether a post is “real tech news,” we hand that job to Claude with a carefully crafted system prompt based on Editorial Guidelines:

A post qualifies if it is relevant to enterprise software development AND meets at least one of the following: Change, Innovation, or Emergent Ideas, and is not a minor patch release, pure marketing, or clickbait.

Claude receives posts in batches of 25, returns a JSON array of qualifying indices, and we map those back to posts. If the API is unreachable, the batch passes through unfiltered as a deliberate fail-safe so the pipeline never breaks.

We used structured JSON output (output_config.format.type = “json_schema”) to guarantee a parseable response every time, no regex needed.

Deduplication Without a Database

To prevent re-evaluating the same URLs across hourly runs (and paying for unnecessary AI API calls), we persist a rolling state file — state/seen-urls.json — in Blob Storage. On each run:

  • Load seen URLs into a HashSet<string> for O(1) lookup
  • Filter new posts against it
  • After filtering, mark all new posts as seen (not just the ones that passed the AI filter — rejected posts shouldn’t be retried)
  • Prune entries older than 7 days to keep the file small

No database, no Redis, no infrastructure overhead. A blob file is enough.

The AI Filter in Practice

A typical hourly run might look like this:

Fetched 312 posts from the last 7 days.

Deduplication: 47 new / 265 already seen (skipped).

Running news quality filter on 47 new posts…

News filter: 11/25 posts passed.

News filter: 9/22 posts passed.

Filter complete: 20/47 posts kept.

20 posts saved to 2026/03/24/09-00-01.json

Out of 312 raw posts, 20 make it through. That’s the kind of signal-to-noise ratio that makes a curated feed actually worth reading.

Deployment

The whole thing deploys with two commands:

# Push app settings (API keys, schedule, etc.)

az functionapp config appsettings set \

  –name FuncNewsAggregation \

  –resource-group rg-news-aggregators \

  –settings @appsettings.json

# Publish the function

func azure functionapp publish FuncNewsAggregation –dotnet-isolated

Done. The function is live, running on Azure’s infrastructure, costing pennies per day.

What’s Next

A few natural extensions:

  • Email or Slack digest — trigger a Logic App when a new blob is written
  • Web frontend — serve the JSON blobs as a read-only news feed
  • Scoring — weight HN scores more heavily now that RSS drops Reddit scores
  • More sources — dev.to, lobste.rs, or custom RSS feeds are easy to add

Takeaways

The most interesting lesson here isn’t the code, it’s the division of labor. Deterministic logic handles the mechanical work: fetching, deduplicating, and scheduling. The judgment call “Is this actually news?”  goes to the model.

That separation keeps the system simple, cheap to run, and easy to adjust. Change the system prompt, and you change the editorial policy. No retraining, no feature engineering.

Two hours from idea to deployed function. That’s the pace at which you can build now.


All source code is C# targeting .NET 8. The function runs on an Azure Consumption plan and incurs roughly $0 in hourly costs well within the free tier.

Agentic Orchestration: The Evolution of SOA

For decades, integration professionals have shaped the digital backbone of enterprises from EAI to SOA to microservices. Today, agentic orchestration marks the next step in that evolution: transforming how we compose, coordinate, and reason across enterprise services. This isn’t a replacement for what we know; it’s an intelligent upgrade to it.

We built the bridges, the highways, and the intricate railway networks of the digital world. Yet, let’s be honest—for all our sophistication, our orchestrations often felt like a meticulous, rigid dance.

Enter Agentic Orchestration. This isn’t just another buzzword. It’s a profound shift, an evolution that takes the core principles of SOA and infuses them with intelligence, dynamism, and a remarkable degree of autonomy. For the seasoned integration architect and engineer, this isn’t about replacing what we know—it’s about enhancing it, elevating it to a new plane of capability.

How SOA Composites Differ from Agentic Orchestration

Cast your mind back to the golden age of SOA. For those of us in the Microsoft ecosystem, this meant nearly two and a half decades with BizTalk Server as our workhorse, our battleground, our canvas. We diligently crafted composite services using orchestration designers, adapters, and pipelines. Others wielded BPEL and ESBs, but the principle was the same. Our logic was clear, explicit, and, crucially, deterministic.

If a business process required validating a customer, then checking inventory, and finally processing an order, we laid out that sequence with unwavering precision—whether in BizTalk’s visual orchestration designer or in BPEL code:

XML

<bpel:sequence name="OrderFulfillmentProcess">
  <bpel:invoke operation="validateCustomer" partnerLink="CustomerService"/>
  <bpel:invoke operation="checkInventory" partnerLink="InventoryService"/>
  <bpel:invoke operation="processPayment" partnerLink="PaymentService"/>
</bpel:sequence>

Those of us who spent years with BizTalk know this dance intimately: the Receive shapes, the Decision shapes, the carefully constructed correlation sets, the Scope shapes wrapped around every potentially fragile operation. We debugged orchestrations at 2 AM, optimized dehydration points, and became masters of the Box-Line-Polygon visual language.

This approach delivered immense value. It brought order to chaos, reused services, and provided a clear, auditable trail. However, its strength was also its weakness: rigidity. Any deviation or unforeseen circumstance required a developer to step in, modify the orchestration, and redeploy. The system couldn’t “think” its way around a problem it merely executed a predefined script a well-choreographed ballet, beautiful but utterly inflexible to improvisation.

Agentic Orchestration: From Fixed Scripts to Intelligent Collaboration

Now, imagine an orchestration that doesn’t just execute a script, but reasons. An orchestration where the “participants” are not passive services waiting for an instruction, but intelligent agents equipped with goals, memory, and a suite of “tools”—which, for us, are often our existing services and APIs.

This is the essence of agentic orchestration. It shifts from a predefined, top-down command structure to a more collaborative, goal-driven paradigm. Instead of meticulously charting every step, we define the desired outcome and empower intelligent agents to find the best path to it.

Think of it as moving from a detailed project plan (SOA) to giving a highly skilled project manager (the Orchestrator Agent) a clear objective and a team of specialists (worker agents, each with specific skills/tools).

Key Differences that Matter

From Fixed Sequence to Dynamic Planning:

Traditional SOA executes a predetermined sequence: Step A, then Step B, then Step C. Agentic orchestration takes a different approach — agents dynamically construct their plan based on current context and available resources, asking: “What tools do I have, and which best serve this step?”

From Explicit Error Handling to Self-Correction:

In SOA, elaborate try-catch blocks covered every potential failure. BizTalk veterans will remember wrapping Scope shapes inside Scope shapes, each carrying its own exception handler. With agentic systems, a failing tool triggers reasoning rather than a halt — the agent may retry with a different tool, consult another agent, or revise its plan entirely.

From API Contracts to Intent-Based Communication:

Traditional SOA services communicate via strict, often verbose XML or JSON contracts — schema design and message transformation consumed countless engineering hours. Agentic systems shift to intent-based communication instead. An “Order Fulfillment Agent” can instruct a “Shipping Agent” with a clear goal: “Ship this package to customer X by date Y.” The Shipping Agent then determines which underlying tools, FedEx API, DHL API, best achieve that outcome, abstracting away the complexity of individual service calls.

From Static Connectors to Smart Tools:

Connectors and adapters in SOA are fixed pathways, each requiring explicit configuration per integration point. BizTalk veterans know this well from hours spent configuring adapters for every specific endpoint. In agentic architectures, existing APIs, databases, message queues, and even legacy systems are reframed as tools that agents can discover and wield intelligently. A Logic App connector to SAP is no longer just a connector; it becomes a capable SAP tool that an agent can invoke when the situation calls for it. The Model Context Protocol (MCP) is making this kind of dynamic tool discovery increasingly seamless.

A Concrete Example

Consider an order that fails the inventory check in our traditional BPEL or BizTalk orchestration. In SOA: hard stop, send error notification, await human intervention, and process redesign.

In an agentic system, the orchestrator agent might dynamically query alternate suppliers, adjust delivery timelines based on customer priority, suggest product substitutions, or even negotiate partial fulfillment—all without hardcoded logic for each scenario. The agent reasons about the business goal (fulfill the customer order) and uses available tools to achieve it, adapting to circumstances we never explicitly programmed for.

Azure Logic Apps: The Bridge to the Agentic Future

Azure Logic Apps demonstrates this evolution in practice, and it’s particularly compelling for integration professionals. For those of us coming from the BizTalk world, Logic Apps already felt familiar—the visual designer, the connectors, the enterprise reliability. Now, we’re not throwing away our decades of experience with these patterns. Instead, we’re adding an “intelligence layer” on top.

The Agent Loop within Logic Apps, with its “Think-Act-Reflect” cycle, transforms our familiar integration canvas into a dynamic decision-making engine. We can build multi-agent patterns—agent “handoffs” in which one agent completes a task and passes it to another, or “evaluator-optimizer” setups in which one agent generates a solution and another critiques and refines it.

All this, while leveraging the robust, enterprise-ready connectors we already depend on. Our existing investments in integration infrastructure don’t become obsolete; they become more powerful. The knowledge we gained from debugging BizTalk orchestrations, understanding message flows, and designing for reliability? All of that remains valuable. Microsoft is simply upgrading our toolkit.

Adopting Agentic Orchestration: The Path Forward for Integration Architects

For integration engineers and architects, this is not a threat but an immense opportunity. We are uniquely positioned to lead this charge. We understand the nuances of enterprise systems, the criticality of data integrity, and the challenges of connecting disparate technologies. Those of us who survived the BizTalk years are battle-tested, we know what real-world integration demands.

Agentic orchestration frees us from the burden of explicit, step-by-step programming for every conceivable scenario. It allows us to design systems that are more resilient, more adaptive, and ultimately, more intelligent. It enables us to build solutions that not only execute business processes but also actively contribute to achieving business outcomes.

Start small: Identify one rigid orchestration in your current architecture that would benefit from adaptive decision-making. Perhaps it’s an order-fulfillment process with too many exception handlers, or a customer-onboarding workflow that breaks when regional requirements change. That’s your first candidate for agentic enhancement.

Let’s cast aside the notion of purely deterministic choreography. Let us instead embrace the era of intelligent collaboration, where our meticulously crafted services become the powerful tools in the hands of autonomous, reasoning agents.

The evolution is here. It’s time to orchestrate a smarter future.

Europe’s Sovereignty Challenge: A Framework for Cloud Control

Europe’s sovereignty challenge has moved from political debate to concrete policy. With the EU’s new Cloud Sovereignty Framework now in place, the continent is redefining how it procures and governs cloud infrastructure, shifting from dependency on foreign providers to measurable, auditable control over its digital destiny.

Today, Europe and the Netherlands find themselves at a crucial junction, navigating the complex landscape of digital autonomy. The recent introduction of the EU’s new Cloud Sovereignty Framework is the clearest signal yet that the continent is ready to take back control of its digital destiny.

This isn’t just about setting principles; it’s about introducing a standardized, measurable scorecard that will fundamentally redefine cloud procurement.

Europe’s Sovereignty Challenge: Why Digital Independence Is Non-Negotiable

The digital revolution has brought immense benefits, yet it has also positioned Europe in a state of significant dependency. Approximately 80% of our digital infrastructure relies on foreign companies, primarily American cloud providers. This dependence is not merely a matter of convenience; it’s a profound strategic vulnerability.

The core threat stems from U.S. legislation such as the CLOUD Act, which grants American law enforcement the power to request data from U.S. cloud service providers, even if that data is stored abroad. Moreover, this directly clashes with Europe’s stringent privacy regulations (GDPR) and exposes critical European data to external legal and geopolitical risk.

As we’ve seen with incidents like the Microsoft-ICC blockade, foreign political pressures can impact essential digital services. The possibility of geopolitical shifts, such as a “Trump II” presidency, only amplifies this collective awareness: we cannot afford to depend on foreign legislation for our critical infrastructure. The risk is present, and we must build resilience against it.

The Sovereignty Scorecard: From Principles to SEAL Rankings

The new Cloud Sovereignty Framework is the EU’s proactive response. It shifts the discussion from abstract aspirations to concrete, auditable metrics by evaluating cloud services against eight Sovereignty Objectives (SOVs) that cover legal, strategic, supply chain, and technological aspects.

The result is a rigorous “scorecard.” A provider’s weighted score determines its SEAL ranking (from SEAL-0 to SEAL-4, with SEAL-4 indicating full digital sovereignty). Crucially, this ranking is intended to serve as the definitive minimum assurance factor in government and public sector cloud procurement tenders. The Commission wants to create a level playing field where providers must tangibly demonstrate their sovereignty strengths.

Hyperscalers vs. European Providers: The Cloud Sovereignty Challenge

The framework has accelerated a critical duality in the market: massive, centralized investments by US hyperscalers versus strategic, federated growth by European alternatives.

Hyperscalers Adapt: Deepening European Ties

Global providers are making sovereignty a mandatory architectural and legal prerequisite by localizing their operations and governance.

  • AWS explicitly responded by announcing its EU Sovereign Cloud unit. This service is structured to ensure data residency and operational autonomy within Europe, explicitly targeting the SOV-3 (Data & AI Sovereignty: The degree of control customers have over their data and AI models, including where data is processed) criteria through physically and logically separated infrastructure and governance.
  • Google Cloud has also made significant moves, approaching digital sovereignty across three distinct pillars:
    • Data Sovereignty (focusing on control over data storage, processing, and access with features like the Data Boundary and External Key Management, EKM, where keys can be held outside Google Cloud’s infrastructure);
    • Operational Sovereignty (ensuring local partner oversight, such as the partnership with T-Systems in Germany); and
    • Software Sovereignty (providing tools to reduce lock-in and enable workload portability).To help organizations navigate these complex choices, Google introduced the Digital Sovereignty Explorer, an interactive online tool that clarifies terms, explains trade-offs, and guides European organizations in developing a tailored cloud strategy across these three domains. Furthermore, Google has developed highly specialized options, including Air-Gapped solutions for the defense and intelligence sectors, demonstrating a commitment to the highest levels of security and residency.
  • Microsoft has demonstrated a profound deepening of its commitment, outlining five comprehensive digital commitments designed to address sovereignty concerns:
    • Massive Infrastructure Investment: Pledging a 40% increase in European datacenter capacity, doubling its footprint by 2027.
    • Governance and Resilience: Instituting a “European cloud for Europe” overseen by a dedicated European board of directors (composed exclusively of European nationals) and backed by a “Digital Resilience Commitment” to contest any government order to suspend European operations legally.
    • Data Control: Completing the EU Data Boundary project to ensure European customers can store and process core cloud service data within the EU/EFTA.

European Contenders Scale Up

Strategic, open-source European initiatives powerfully mirror this regulatory push:

  • Virt8ra Expands: The Virt8ra sovereign cloud, which positions itself as a significant European alternative, recently announced a substantial expansion of its federated infrastructure. The platform, coordinated by OpenNebula Systems, added six new cloud service providers, including OVHcloud and Scaleway, significantly broadening its reach and capacity across the continent.
  • IPCEI Funding: This initiative, leveraging the open-source OpenNebula technology, is part of the Important Project of Common European Interest (IPCEI) on Next Generation Cloud Infrastructure and Services, backed by over €3 billion in public and private funding. This is a clear indicator that the vision for a robust, distributed European cloud ecosystem is gaining significant traction.

Redefining European Cloud Sovereignty: Resilience Over Isolation

Industry experts emphasize that the framework embodies a more mature understanding of digital sovereignty. It’s not about isolation (autarky), but about resilience and governance.

Sovereignty is about how an organization is “resilient against specific scenarios.” True sovereignty, in this view, lies in the proven, auditable ability to govern your own digital estate. For developers, this means separating cloud-specific infrastructure code from core business logic to maximize portability, allowing the use of necessary hyper-scale features while preserving architectural flexibility.

The Challenge: Balancing Features with Control

Despite the massive investments and public commitments from all major players, the framework faces two key hurdles:

  • The Feature Gap: European providers often lack the “huge software suite” and “deep feature integration” of US hyperscalers, which can slow down rapid development. Advanced analytics platforms, serverless computing, and tightly integrated security services often lack direct equivalents at smaller providers. This creates a complex chicken-and-egg problem: large enterprises won’t migrate to European providers because they lack features, but local providers struggle to develop those capabilities without enterprise revenue.
  • Skepticism and Compliance Complexity: Some analysts fear the framework’s complexity will inadvertently favor the global giants with larger compliance teams. Furthermore, deep-seated apprehension in the community remains, with some expressing the fundamental desire for purely European technological solutions: “I don’t want a Microsoft cloud or AI solutions in Europe. I want European ones.” Some experts suggest that European providers should focus on building something different by innovating with European privacy and control values baked in, rather than trying to catch up with US providers’ feature sets.

My perspective on this situation is that achieving true digital sovereignty for Europe is a complex and multifaceted endeavor. While the commitments from global hyperscalers are significant, the underlying desire for independent, European-led solutions remains strong. It’s about strategic autonomy, ensuring that we, as Europeans, maintain ultimate control over our digital destiny and critical data, irrespective of where the technology originates.

The race is now on. The challenge for the cloud industry is to translate the high-level, technical criteria of the SOVs into auditable, real-world reality to achieve that elusive top SEAL-4 ranking. The battle for the future of Europe’s cloud is officially underway.

Walking Skeleton & Pipes and Filters for Enterprise Integration

In enterprise integration, a solid architectural foundation is what separates projects that scale from those that collapse under their own complexity. Two patterns I keep returning to are the Walking Skeleton and Pipes and Filters, which I didn´t realize at first. In this post, I’ll show how they work together when building or rebuilding an integration platform.

An example is that my experience in retail showed me that, and I was involved in rebuilding an integration platform. In the world of integration, where you’re constantly juggling disparate systems, multiple data formats, and unpredictable volumes, a solid architecture is paramount. Thus, I always try to build the best solution based on experience rather than on what’s written in the literature.

What is funny to me is that when I built the integration platform, I realized I was applying patterns such as the Walking Skeleton for architectural validation and the Pipes and Filters pattern for resilient, flexible integration flows.

The Walking Skeleton caught my attention when a fellow architect at my current workplace brought it to my attention. And I realized that this is what I actually did with my team at the retailer. Hence, I should read some literature from time to time!

Why the Walking Skeleton Is Your Integration Architecture First Step

Before you write a line of business logic, you need to prove your stack works from end to end. The Walking Skeleton is precisely that: a minimal, fully functional implementation of your system’s architecture.

It’s not an MVP (Minimum Viable Product), which is a business concept focused on features; the Skeleton is a technical proof of concept focused on connectivity.

Why Build the Skeleton First?

  • Risk Mitigation: You validate your major components—UI, API Gateway, Backend Services, Database, Message Broker—can communicate and operate correctly before you invest heavily in complex features.
  • CI/CD Foundation: By its nature, the Skeleton must run end-to-end. This forces you to set up your CI/CD pipelines early, giving you a working deployment mechanism from day one.
  • Team Alignment: A running system is the best documentation. Everyone on the team gets a shared, tangible understanding of how data flows through the architecture.

Suppose you’re building an integration platform in the cloud (like with Azure). In that case, the Walking Skeleton confirms your service choices, such as Azure Functions and Logic Apps, which integrate with your storage, networking, and security layers. Guess what I am going to do again in the near future, I hope.

Applying Pipes and Filters Within the Walking Skeleton

Now, let’s look at what that “minimal, end-to-end functionality” should look like, especially for data and process flow. The Pipes and Filters pattern is ideally suited for building the first functional slice of your integration Skeleton.

The pattern works by breaking down a complex process into a sequence of independent, reusable processing units (Filters) connected by communication channels (Pipes).

How They Map to Integration:

  1. Filters = Single Responsibility: Each Filter performs one specific, discrete action on the data stream, such as:
    • Schema Validation
    • Data Mapping (XML to JSON)
    • Business Rule Enrichment
    • Auditing/Logging
  2. Pipes = Decoupled Flow: The Pipes ensure data flows reliably between Filters, typically via a message broker or an orchestration layer.

In a serverless environment (e.g., using Azure Functions for the Filters and Azure Service Bus/Event Grid for the Pipes), this pattern delivers immense value:

  • Composability: Need to change a validation rule? You only update one small, isolated Filter. Need a new output format? You add a new mapping Filter at the end of the pipe.
  • Resilience: If one Filter fails, the data is typically held in the Pipe (queue/topic), preventing the loss of the entire transaction and allowing for easy retries.
  • Observability: Each Filter is a dedicated unit of execution. This makes monitoring, logging, and troubleshooting exact no more “black box” failures.

Walking Skeleton and Pipes and Filters: The Synergy

The real power comes from using the pattern within the process of building and expanding your Walking Skeleton:

  1. Initial Validation (The Skeleton): Select the absolute simplest, non-critical domain (e.g., an Article Data Distribution pipeline, as I have done with my team for retailers). Implement this single, end-to-end flow using the Pipes and Filters pattern. This proves that your architectural blueprint and your chosen integration pattern work together.
  2. Iterative Expansion: Once the Article Pipe is proven, validating the architectural choice, deployment, monitoring, and scaling, you have a template.
    • At the retailer, we subsequently built the integration for the Pricing domain, and by creating a new Pipe that reuses common Filters (e.g., the logging or basic validation Filters).
    • Next, we picked another domain by cloning the proven pipeline architecture and swapping in the domain-specific Filters.

You don’t start from scratch; you reapply a proven, validated template across domains. This approach dramatically reduces time-to-market and ensures that every new domain is built on a resilient, transparent, and scalable foundation.

My advice, based on what I know now and my experience, is not to skip the Skeleton. And don’t build a monolith inside it. Start with Pipes and Filters and Skeleton for a future-proof, durable architecture for enterprise integration when rebuilding an integration platform in Azure.

What architectural pattern do you find most useful when kicking off a new integration project? Drop a comment!

AWS Free Tier Goes Credit-Based: How It Compares to Azure and GCP

AWS is officially moving away from its long-standing 12-month free tier for new accounts. The new standard, called the Free Account Plan, is a credit-based model designed to eliminate the risk of unexpected bills for new users.

Free Account Plan

With this new plan, you get:

  • A risk-free environment for experimenting and building proofs of concept for up to six months.
  • A starting credit of $100, with the potential to earn another $100 by completing specific exploration activities, such as launching an EC2 instance. This means you can get up to $200 in credits to use across eligible services.
  • The plan ends after six months or once your credits are entirely spent, whichever comes first. After that, you have a 90-day window to upgrade to a paid plan and restore access to your account and data.

This shift, as Principal Developer Advocate Channy Yun explains, allows new users to get hands-on experience without cost commitments. However, it’s worth noting that some services typically used by large enterprises won’t be available on this free plan.

While some may see this as a step back, I tend to agree with Corey Quinn’s perspective. He writes that this is “a return to product-led growth rather than focusing on enterprise revenue to the exclusion of all else.” Let’s face it: big companies aren’t concerned with the free tier. But for students and hobbyists, who can be seen as the next generation of cloud builders, a credit-based, risk-free sandbox is a much more attractive proposition. The new notifications for credit usage and expiration dates are a smart addition that provides peace of mind.

How the New Plan Compares to Other Hyperscalers

A helpful plan for those who like to experiment on AWS, I think. Yet, other hyperscalers like Azure and GCP offer similar plans too. Microsoft Azure and Google Cloud Platform (GCP) have long operated on credit-based models.

  • Azure offers a different model: $200 in credits for the first 30 days, supplemented by over 25 “always free” services and a selection of services available for free for 12 months.
  • GCP provides a 90-day, $300 Free Trial for new customers, which can be applied to most products, along with an “Always Free” tier that gives ongoing access to core services like Compute Engine and Cloud Storage up to specific monthly limits.

This alignment among the major cloud providers highlights a consensus on the best way to attract and onboard new developers.

Microsoft also offers $100 in Azure credits through Azure for students. Note that the MSDN credits are typically a monthly allowance tied to a specific Visual Studio subscription, and the student credits are a lump sum for a particular period (e.g., 12 months), as I believe these different models can be confusing.

Speaking of other cloud providers, my own experience with Azure is an excellent example of how these credit models can be beneficial. I enjoy credits for Azure because of my MVP benefits, and through MSDN subscriptions, one has a monthly $150 in credits. These are different options from the general one I mentioned earlier. Anyway, there are ways to access services provided by the three big hyperscalers that allow you to get hands-on experience in combination with their documentation and what you can find in public repos.

In general, when you like to learn more about Azure, AWS, or GCP, the following table shows the most straightforward options:

Cloud HyperscalerFree CreditsDocumentationRepo (samples)
AzureAzure Free AccountMicrosoft LearnAzure Samples · GitHub  
AWSAWS Free TierAWS DocumentationAWS Samples · GitHub
GCPGCP Free TrialGoogle Cloud DocumentationGoogle Cloud Platform · GitHub

Digital Destiny: Navigating Europe’s Sovereignty Challenge

During my extensive career in IT, I’ve often seen how technology can both empower and entangle us. Today, Europe and the Netherlands find themselves at a crucial junction, navigating the complex landscape of digital sovereignty. Recent geopolitical shifts and the looming possibility of a “Trump II” presidency have only amplified our collective awareness: we cannot afford to be dependent on foreign legislation when it comes to our critical infrastructure.

In this post, I will delve into the threats and strategic risks that underpin this challenge. We’ll explore the initiatives being undertaken at both the European and Dutch levels, and crucially, what the major U.S. Hyperscalers are now bringing to the table in response.

The Digital Predicament: Threats to Our Autonomy

The digital revolution has certainly brought unprecedented benefits, not least through innovative Cloud Services that are transforming our economy and society. However, this advancement has also positioned Europe in a state of significant dependency. Approximately 80% of our digital infrastructure relies on foreign companies, primarily American cloud providers, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud. This reliance isn’t just a matter of convenience; it’s a strategic vulnerability.

The Legal Undercurrent: U.S. Legislation

One of the most persistent threats to European digital sovereignty stems from American legislation. The CLOUD Act (2018), an addition to the Freedom Act (2015) that replaced the Patriot Act (2001), grants American law enforcement and security services the power to request data from American cloud service providers, even if that data is stored abroad.

Think about it: if U.S. intelligence agencies can request data from powerhouses like AWS, Microsoft, or Google without your knowledge, what does this mean for European organizations that have placed their crown jewels there? This directly clashes with Europe’s stringent privacy regulations, the General Data Protection Regulation (GDPR), which sets strict requirements for the protection of personal data of individuals in the EU.

While the Dutch National Cyber Security Centre (NCSC) has stated that, in practice, the chance of the U.S. government requesting European data via the CLOUD Act has historically been minimal, they also acknowledge that this could change with recent geopolitical developments. The risk is present, even though it has rarely materialized thus far.

Geopolitics: The Digital Chessboard

Beyond legal frameworks, geopolitical developments pose a very real threat to our digital autonomy. Foreign governments may impose trade barriers and sanctions on Cloud Services. Imagine scenarios where tensions between major powers lead to access restrictions for essential Cloud Services. The European Union or even my country cannot afford to be a digital pawn in such a high-stakes game.

We’ve already seen these dynamics play out. In negotiations for a minerals deal with Ukraine, the White House reportedly made a phone call to stop the delivery of satellite images from Maxar Technologies, an American space company. These images were crucial for monitoring Russian troop movements and documenting war crimes.

Another stark example is the Microsoft-ICC incident, where Microsoft blocked access to email and Office 365 services for the chief prosecutor of the International Criminal Court in The Hague due to American sanctions. These incidents serve as powerful reminders of how critical external political pressures can be in impacting digital services.

Europe’s Response: A Collaborative Push for Sovereignty

Recognizing these challenges, both Europe and the Netherlands are actively pursuing initiatives to bolster digital autonomy. It’s also worth noting how major cloud providers are responding to these evolving demands.

European Ambitions:

The European Union has been a driving force behind initiatives to reinforce its digital independence:

  • Gaia-X: This ambitious European project aims to create a trustworthy and secure data infrastructure, fostering a federated system that connects existing European cloud providers and ensures compliance with European regulations, such as the General Data Protection Regulation (GDPR). It’s about creating a transparent and controlled framework.
  • Digital Markets Act (DMA) & Digital Services Act (DSA): These legislative acts aim to regulate the digital economy, fostering fairer competition and greater accountability from large online platforms.
  • Cloud and AI Development Act (proposed): This upcoming legislation seeks to ensure that strategic EU use cases can rely on sovereign cloud solutions, with the public sector acting as a crucial “anchor client.”
  • EuroStack: This broader initiative envisions Europe as a leader in digital sovereignty, building a comprehensive digital ecosystem from semiconductors to AI systems.

Crucially, we’re seeing tangible progress here. Virt8ra, a significant European initiative positioning itself as a major alternative to US-based cloud vendors, recently announced a substantial expansion of its federated infrastructure. The platform, which initially included Arsys, BIT, Gdańsk University of Technology, Infobip, IONOS, Kontron, MONDRAGON Corporation, and Oktawave, all coordinated by OpenNebula Systems, has now been joined by six new cloud service providers: ADI Data Center Euskadi, Clever Cloud, CloudFerro, OVHcloud, Scaleway, and Stackscale. This expansion is a clear indicator that the vision for a robust, distributed European cloud ecosystem is gaining significant traction.

Dutch Determination:

The Netherlands is equally committed to this journey:

  • Strategic Digital Autonomy and Government-Wide Cloud Policy: A coalition of Dutch organizations has developed a roadmap, proposing a three-layer model for government cloud policy that advocates for local storage of state secret data and autonomy requirements for sensitive government data.
  • Cloud Kootwijk: This initiative brings together local providers to develop viable alternatives to hyperscaler clouds, fostering homegrown digital infrastructure.
  • “Reprogram the Government” Initiative: This initiative advocates for a more robust and self-reliant digital government, pushing for IT procurement reforms and in-house expertise.
  • GPT-NL: A project to develop a Dutch language model, strengthening national strategic autonomy in AI and ensuring alignment with Dutch values.

Hyperscalers and the Sovereignty Landscape:

The growing demand for digital sovereignty has prompted significant responses from major cloud providers, demonstrating a recognition of European concerns:

  • AWS European Sovereign Cloud: AWS has announced key components of its independent European governance for the AWS European Sovereign Cloud.
  • Microsoft’s Five Digital Commitments: Microsoft recently outlined five significant digital commitments to deepen its investment and support for Europe’s technological landscape.

These efforts from hyperscalers highlight a critical balance. As industry analyst David Linthicum noted, while Europe’s drive for homegrown solutions is vital for data control, it also prompts questions about access to cutting-edge innovations. He stresses the importance of “striking the right balance” to ensure sovereignty efforts don’t inadvertently limit access to crucial capabilities that drive innovation.

However, despite these significant investments, skepticism persists. There is an ongoing debate within Europe regarding digital sovereignty and reliance on technology providers headquartered outside the European Union. Some in the community express doubts about how such companies can truly operate independently and prioritize European interests, with comments like, “Microsoft is going to do exactly what the US government tells them to do. Their proclamations are meaningless.” Others echo the sentiment that “European money should not flow to American pockets in such a way. Europe needs to become independent from American tech giants as a way forward.” This collective feedback highlights Europe’s ongoing effort to develop its own technological capabilities and reduce its reliance on non-European entities for critical digital infrastructure.

My perspective on this situation is that achieving true digital sovereignty for Europe is a complex and multifaceted endeavor, marked by both opportunities and challenges. While the commitments from global hyperscalers are significant and demonstrate a clear response to European demands, the underlying desire for independent, European-led solutions remains strong. It’s not about outright rejection of external providers, but about strategic autonomy – ensuring that we, as Europeans, maintain ultimate control over our digital destiny and critical data, irrespective of where the technology originates.

Azure Cosmos DB’s Latest Performance Features

As an earlier adopter of Azure Cosmos DB, I have always been following the developments of this service and have built up my experience myself with leveraging it for monitoring purposes (a recent one is presented at Azure Cosmos DB Conf 2023 – Leveraging Azure Cosmos DB for End-to-End Monitoring of Retail Processes).

Azure Cosmos DB

For those unfamiliar with Azure Cosmos DB, Microsoft’s globally distributed, multi-model database service offers low-latency, scalable storage and querying of diverse data types. It allows developers to build applications with data access and high availability across regions. Its well-known counterpart is Amazon DynamoDB.

In this blog post, I like to point out some recent optimizations of the service around performance. Moreover, I have written an InfoQ news item recently on this as well.

Priority-based execution

One of the more recent features introduced in the service is priority-based execution, which is currently in public preview.  It allows users to define the priority of requests sent to Azure Cosmos DB. When the number of requests surpasses the configured Request Units per second (RU/s) limit, lower-priority requests are slowed down to prioritize the processing of high-priority requests, as specified by the user’s defined priority.

As mentioned in a blog post by Microsoft, this feature empowers users to prioritize critical tasks over less crucial ones in situations where a container surpasses its configured request units per second (RU/s) capacity. Less important tasks are automatically retried by clients using an SDK with the specified retry policy until they can be successfully processed.

With priority-based execution, you have the flexibility to allocate varying priorities to workloads operating within the same container in your application. This proves beneficial in numerous scenarios, including prioritizing read, write, or query operations, as well as giving precedence to user actions over background tasks like bulk execution, stored procedures, and data ingestion/migration.

Once accepted, a nomination form is available to access the feature and .NET SDK.

Hierarchical Partition Keys

In addition to Priority-based execution, the product group for Cosmos DB also introduced Hierarchical Partition Keys to optimize performance.

Hierarchical partition keys enhance Cosmos DB’s elasticity, particularly in scenarios where users utilize synthetic- or logical partition keys surpassing 20 GB of data. By employing up to three keys with hierarchical partitioning, users can effectively sub-partition their data, achieving superior data distribution and enabling greater scalability. Azure Cosmos DB automatically distributes the data among physical partitions, allowing logical partition prefixes to exceed the 20GB storage limit.

According to the documentation, the simplest way to create a container and specify hierarchical partition keys is using the Azure portal. 

For example, you can use hierarchical partition keys to partition data by tenant ID and then by item ID. This way, all items for a given tenant are stored together in the same physical partition. This can improve query performance by reducing the number of physical partitions that need to be queried. 

A more detailed explanation and use case for hierarchical keys in Azure Cosmos DB can be found in the blog post by Leonard Lobel. 

Burst Capacity Feature

Lastly, the team also made the burst capacity feature for Azure Cosmos DB generally available (GA) to allow you to take advantage of your database or container’s idle throughput capacity to handle traffic spikes.   

Burst capacity allows each physical partition to accumulate up to 5 minutes of idle capacity, which can be utilized at a rate of up to 3000 RU/s. This feature is applicable to databases and containers utilizing manual or autoscale throughput, provided they have less than 3000 RU/s provisioned per physical partition.

To begin utilizing burst capacity, access the Features page within your Azure Cosmos DB account and enable the Burst Capacity feature. Please note that the feature may take approximately 15-20 minutes to become active once enabled.  

Enabling the burst capacity feature (Source: Microsoft Learn Bust Capacity)

According to the documentation, to use the feature, you need to consider the following: 

  • If your Azure Cosmos DB account is configured with provisioned throughput (manual or autoscale), burst capacity is not applicable. Burst capacity is specifically for serverless accounts.  
  • Additionally, burst capacity is compatible with Azure Cosmos DB accounts utilizing the API for NoSQL, Cassandra, Gremlin, MongoDB, or Table. 

Lastly, in case you are wondering what the difference between burst capacity and priority-based execution is, Jay Gordon, a Senior Cosmos DB program manager, explained that in the discussion of the blog post around these performance features:

The difference between burst capacity and execution based on priority lies in their impact on performance and resource allocation:

Burst capacity affects the overall throughput capacity of your Azure Cosmos DB container or database. It allows you to temporarily exceed the provisioned throughput to handle sudden spikes in workload. Burst capacity helps maintain low latency and prevent throttling during peak usage periods.

Execution based on priority determines the order in which requests are processed when multiple concurrent requests exist. Higher priority requests are prioritized and typically get faster access to resources for execution. This ensures that essential or time-sensitive operations are processed promptly, while lower-priority requests may experience slight delays.

“In terms of results, burst capacity and execution based on priority are independent. Utilizing burst capacity allows you to handle temporary workload spikes, whereas execution based on importance ensures that higher-priority requests are processed more promptly. These mechanisms work together to optimize performance and resource allocation in Azure Cosmos DB, but they serve different purposes“.

Conclusion

In conclusion, Azure Cosmos DB continues to evolve with new features designed to enhance performance and scalability. The priority-based execution, currently in public preview, enables users to prioritize critical tasks over less important ones when the request unit capacity is exceeded. This flexibility is further enhanced by introducing hierarchical partition keys, allowing optimal data distribution and larger scales in scenarios with substantial data. Additionally, the burst capacity feature, now generally available, provides an efficient way to handle traffic spikes by utilizing idle throughput capacity. Users can easily enable burst capacity through the Azure Cosmos DB account’s Features page, making it a valuable tool for serverless accounts.

Returning to Amazon, DynamoDB, the Cosmos DB counterpart on AWS, offers performance-optimizing capabilities. Concepts are similar.