Azure API Management Semantic Caching: Cut AI Token Costs with Similarity-Based Responses

Posted on June 3, 2026 by steefjan1970

Part 6 of 7 in the “APIM for AI Workloads” series

Azure API Management semantic caching is the most operationally transparent cost optimization in this series. Every technique covered so far, auth, token limits, token metrics, and load balancing, requires deliberate design decisions in how you configure APIM. Semantic caching, by contrast, works silently. Calling applications sends prompts as normal. APIM checks whether a semantically similar prompt has already been answered. If a match exists above a configurable similarity threshold, APIM returns the cached response without touching the AI backend. Zero tokens consumed. Zero latency is added by the model.

For workloads with repetitive prompt patterns, internal FAQ bots, document classifiers, and support agents that see the same questions repeatedly, the cache hit rate can be surprisingly high. Even a 20% hit rate on a high-volume workload translates directly into cost reduction and lower average latency.

How Azure API Management Semantic Caching Works

The azure-openai-semantic-cache-lookup policy sits in the inbound section of your APIM pipeline, before the request reaches the AI backend. When a prompt arrives, APIM sends it to a configured embedding model, typically Azure OpenAI text-embedding-ada-002 or equivalent, to generate a vector representation of the prompt. APIM then compares that vector against cached embeddings stored in Azure Managed Redis using cosine similarity.

If the similarity score between the incoming prompt and a cached prompt falls below the configured score-threshold, APIM treats it as a cache hit and returns the stored response. If no match meets the threshold, APIM forwards the request to the AI backend as normal and stores the response in Redis for future lookups.

Azure API Management semantic caching policy flow showing cache hit returning stored response and cache miss forwarding to Azure OpenAI — Diagram 1: Semantic cache request flow. On a cache hit, APIM returns a stored response directly — consuming zero tokens. On a miss, APIM forwards to the AI backend and stores the response in Azure Managed Redis for future hits.

The generic variant, llm-semantic-cache-lookup, works identically for non-Azure backends. Both require the same supporting infrastructure: an embedding model backend and an Azure Managed Redis instance configured in APIM. The semantic cache store policy handles writing responses back to the cache in the outbound section.

Tuning the Score Threshold for Azure API Management Semantic Caching

The score-threshold attribute is the most consequential configuration decision in the semantic caching policy. It controls how similar an incoming prompt must be to a cached prompt for APIM to treat it as a hit. The value runs from 0.0 to 1.0, but the practical range is much narrower.

Azure API Management semantic caching score threshold tuning guide from aggressive to conservative with vary-by subscription user and global scope strategies — Diagram 2: Score threshold tuning guide and vary-by scope strategies. Lower thresholds cache more aggressively. The default of 0.05 suits most production workloads. A global cache (no vary-by) maximizes hit rate but risks serving the wrong user’s response.

In practice, three zones matter:

0.01 to 0.05 (aggressive). At this range, prompts that are paraphrases of each other — “What is my account balance?” and “Can you show me my current balance?” — reliably produce cache hits. This is the right range for FAQ bots, support agents, and any workload where users ask the same questions in slightly different words. The default of 0.05 sits here and suits most production deployments.

0.05 to 0.20 (conservative). At this range, only prompts that are very close in wording produce hits. Creative workloads, code generation, and document drafting tend to have high prompt variance, so a more conservative threshold avoids serving stale cached responses to genuinely different requests.

Above 0.30 (too strict). At this threshold, almost no prompts match. The cache effectively stops functioning. Avoid this range unless you are deliberately disabling caching for a specific API product while keeping the policy in the pipeline for future use.

Start at 0.05 and monitor cache hit rates in Application Insights. If the hit rate is low for a workload you expect to be repetitive, lower the threshold incrementally. If you start seeing complaints about incorrect or stale responses, raise it.

vary-by Scope: Preventing Cache Pollution

The vary-by element scopes the cache namespace. Without it, all consumers share a single global cache. That maximizes the hit rate but introduces a significant risk: APIM could serve one user’s cached response to a different user. For most enterprise AI workloads, that is unacceptable.

The safest default is to vary by Subscription ID, which gives each API subscriber their own cache namespace. This prevents cross-team cache pollution while still achieving high hit rates within each subscriber’s own prompt patterns. For multi-tenant applications where individual users have distinct contexts, vary by a user identifier extracted from the JWT or a custom header instead.

A global cache with no vary-by is appropriate only for fully public, stateless APIs where responses are identical regardless of who requests them. Internal enterprise AI workloads rarely meet that bar.

Infrastructure Requirements for Azure API Management Semantic Caching

Semantic caching requires two supporting Azure resources beyond APIM itself. First, an Azure Managed Redis instance configured as an external cache in APIM. Redis stores the prompt embeddings and cached responses. The cache TTL is configurable in the store policy, so you control how long responses remain valid before APIM re-queries the backend.

Second, an embeddings model backend registered in APIM. For Azure OpenAI, this is typically a separate deployment of text-embedding-ada-002 or text-embedding-3-small. The embeddings backend is referenced by the embeddings-backend-id attribute. It is separate from your completions backend, so you can apply independent token limits and load balancing to the embeddings traffic.

One practical consideration: the embeddings call itself consumes tokens and adds a small amount of latency on every request, whether or not the cache hits. For workloads with very low prompt repetition, the overhead of generating embeddings for every request may outweigh the savings from occasional cache hits. Measure the hit rate before committing the infrastructure cost.

What’s Next in This Azure API Management for AI Series

Part 7 closes the series by covering APIM’s emerging role as an MCP gateway for agentic AI workloads: how to expose REST APIs as MCP servers, pass through existing MCP servers, and manage agent-to-agent traffic through the same control plane we’ve built across this series.

Part 7: APIM as an MCP gateway for agentic AI workloads.

AI Gateway Commercial vs Open Source: How to Choose the Right Control Plane

Posted on May 15, 2026 by steefjan1970

The AI gateway commercial vs. open-source decision is one that most organizations reach not by planning but by accident. One team has already integrated directly with Azure OpenAI. Another is using LiteLLM to wrap a few models. A third wants to use the enterprise API management platform you already have. Suddenly, you need to make a choice, and the conversation gets complicated fast.

This companion post to my APIM for AI Workloads series takes a step back from the Azure API Management specifics and addresses the question that comes before all of it: which gateway should you be using in the first place? The series covers APIM in depth because it’s the right answer for the Microsoft ecosystem. But it’s not the only answer, and for some organizations it’s not the right one.

Here is how to think through the decision properly.

Why the AI Gateway Commercial vs Open Source Choice Matters More Than You Think

Most API gateway decisions are relatively low-stakes. If you pick the wrong one, you migrate. But the AI gateway decision carries more weight for two reasons.

First, the gateway sits in the critical path of every AI interaction in your organization. Its policy language, authentication model, and observability hooks become embedded in the way your teams build AI-powered applications. Switching later is not impossible, but it is disruptive.

Second, the governance patterns you establish now, how you handle token limits, cross-charging, PII, and compliance logging, are much harder to retrofit than to design in from the start. The Team Rockstars IT AI Gateway whitepaper, published this month, makes this point well: organizations that set up audit logging via an AI gateway from day one build a direct compliance advantage under the EU AI Act. Those who add it later risk complex and costly rework.

So the choice deserves deliberate thought, not a default.

The Commercial Options for AI Gateway

Commercial AI gateways offer a faster path to production and offload operational complexity to the vendor. The main options in the market today are:

Azure API Management is the right choice if you are already in the Microsoft ecosystem. Its AI-specific policy extensions for token limits, token metrics, semantic caching, and load balancing across PTU and PAYG backends are mature and tightly integrated with Azure Monitor and Application Insights. The series covers this in depth from Part 1 onwards.

Kong Konnect is a strong option for organizations that already use Kong for API management and want to extend it into AI. Its plugin ecosystem covers rate limiting, authentication, and observability, with AI-specific plugins growing quickly.

Portkey is purpose-built as an AI gateway with a lightweight footprint and fast time-to-value. It supports a broad range of model providers, has built-in semantic caching and observability, and is a practical option for teams that want AI governance without the overhead of a full enterprise API management platform.

Apigee (Google Cloud) is the natural choice for GCP-centric organizations. Like APIM in the Microsoft world, its AI gateway capabilities are deepening with each release as Google embeds Gemini and Vertex AI integrations.

The common advantages across all commercial options are faster deployment, built-in compliance features, vendor support contracts, and operational burden offloaded to the vendor. The common risks are licensing costs, proprietary policy languages that create switching friction, and dependency on the vendor’s roadmap.

The Open Source Options for AI Gateway

Open-source gateways offer maximum control and no licensing costs, but they require your organization to own what the vendor would otherwise handle.

LiteLLM is the most widely adopted open source AI gateway today. It provides a unified API across more than 100 model providers, with built-in rate limiting, spend tracking, and a proxy server that is straightforward to self-host. The community is active, and the feature velocity is high. The supply chain risk is real, though: a 2025 attack targeting LiteLLM and Trivy demonstrated that even widely used security-adjacent tools can become attack vectors. If you run LiteLLM in production, you own the patching cadence.

Agent Gateway from Anthropic is purpose-built for MCP and agentic traffic. If your primary use case is governing tool calls from AI agents rather than managing completion API traffic, it is worth evaluating alongside the broader options.

One API provides a unified, OpenAI-compatible interface across multiple providers and is widely used by organizations seeking provider-agnostic routing without vendor lock-in.

HelixML focuses on self-hosted deployments with strong data-sovereignty properties, making it relevant for organizations where data-residency requirements rule out SaaS-based gateway options.

AI Gateway Commercial vs Open Source: Five Decision Factors

AI gateway commercial vs open source comparison matrix across time to value compliance internal capability flexibility and supply chain risk — Diagram 1: Commercial vs open source AI gateway decision factors. Neither option wins across the board — the right choice depends on your compliance posture, internal capability, and how much operational complexity you want to own.

Five factors consistently determine which direction is right for a given organization:

Time to value. In my experience, commercial gateways can be production-ready in days to weeks. Open source deployments typically take weeks to months to reach production quality, depending on how much custom policy logic you need to build. If you have an urgent compliance or cost control problem to solve, commercial is the pragmatic choice.

Compliance and data residency. For Dutch and European organizations operating under AVG, NIS2, and the EU AI Act, commercial gateways offer contractual guarantees: data processing agreements, certified regions, and SLAs with defined incident response times. Open source can meet the same requirements, but you are responsible for demonstrating compliance yourself rather than relying on a vendor certification.

Internal platform capability. Open source is not free. The licensing cost is zero, but according to the CNCF’s platform engineering maturity model. Organizations without a dedicated platform engineering team that can credibly own the gateway long-term should not choose open source. The operational gap will become visible at the worst possible moment.

Flexibility and lock-in risk. Open source wins on long-term flexibility. Proprietary policy languages in commercial gateways create switching friction that grows over time as you invest in custom policies. If multi-cloud strategy and provider-agnosticism are strategic priorities, design your gateway layer with that in mind from the start, even if you begin with a commercial option, applying the strangler fig pattern to abstract away proprietary dependencies over time.

Supply chain risk. This factor is underweighted in most evaluations. The 2025 supply chain attack targeting LiteLLM and Trivy demonstrated that open source security tooling itself can become an attack vector. Commercial vendors have contractual obligations around vulnerability disclosure and patching. With open source, that obligation falls to your team.

A Decision Framework for AI Gateway Commercial vs Open Source

AI gateway decision flowchart showing when to choose commercial APIM Kong Portkey versus open source LiteLLM Agent Gateway based on compliance capability and cloud ecosystem — *Diagram 2: Decision flowchart for choosing between commercial and open source AI gateways. Compliance requirements, internal capability, and cloud ecosystem fit are the three most decisive factors.*

The flowchart above works through the most decisive questions in order. A few practical observations from applying it:

Regulated industries almost always land in commercial. Healthcare, financial services, and insurance organizations operating under Dutch or European regulation have compliance requirements that are significantly easier to satisfy with contractual vendor guarantees than with self-operated open source tooling. At my company, the AVG and healthcare-specific data processing requirements made APIM the clear choice.

The hybrid pattern is underused. Many organizations run a commercial gateway in production for governed workloads, while developer teams use LiteLLM or a lightweight open source option in lower environments for experimentation. This gives you the compliance and operational properties you need in production while keeping the innovation surface open. It is more work to maintain two gateway patterns, but the tradeoff is often worth it.

Design for replaceability regardless of what you choose. The Team Rockstars whitepaper frames this well: choose your first gateway deliberately, but design for replacement. Use open standards, abstract your policy logic where possible, and avoid deep coupling to proprietary features without open-source equivalents. The gateway landscape is evolving fast enough that what is the right choice today may not be in two years.

Where This Fits in the APIM for AI Workloads Series

The rest of the series goes deep on Azure API Management specifically: the token metric policy, load balancing and circuit breaking, semantic caching, and MCP gateway for agentic workloads. If you have landed on APIM as your gateway of choice or if you are in a Microsoft-centric organization where it is the natural fit, the series covers the production patterns you need.

Part 1: Why your AI APIs need a gateway.
Part 2: Authentication and authorization.
Part 3: Token limit policy.
Part 4: Token metric policy and cross-charging.
Part 5: Load balancing and circuit breaking.
Part 6: Semantic caching.
Part 7: APIM as MCP gateway for agentic AI workloads.

Azure API Management Token Limit Policy: Controlling AI Token Consumption Per Consumer

Posted on May 11, 2026 by steefjan1970

Part 3 of 7 in the “APIM for AI Workloads” series

The Azure API Management token limit policy is one of the most direct cost control levers you have for AI workloads. In Part 1 of this series, I argued that token consumption is invisible without the right instrumentation. The token limit policy is the enforcement side of that equation: once you know how many tokens consumers are using, you set boundaries so that no single consumer can exhaust your model capacity or run up an unexpected bill.

This post covers how the policy works, which counter-key strategy to choose for your workload, how to size your tokens-per-minute (TPM) limits, and the difference between the Azure OpenAI-specific policy and the generic LLM variant for non-Microsoft backends.

Azure API Management Token Limit Policy: How It Works

The azure-openai-token-limit policy sits in the inbound section of your APIM policy pipeline. Before any request reaches the AI backend, APIM checks a sliding window counter keyed to the value you specify. If the caller is within their TPM budget, the request passes through. If they’ve exceeded it, APIM returns a 429 Too Many Requests response with a Retry-After header, and the backend never sees the request.

This is important: the throttling happens at the gateway, not at the Azure OpenAI endpoint. That means you’re not paying for rejected requests, and your model deployment is protected from saturation by a single runaway consumer.

Azure API Management token limit policy funnel throttling AI requests with 429 response and Retry-After header — Diagram 1: The token limit policy acts as a funnel in the APIM inbound pipeline. Requests within the TPM budget pass through to the AI backend. Requests exceeding the limit receive a 429 status code with a Retry-After header before the backend is even reached.

The policy has two variants. The azure-openai-token-limit policy is purpose-built for Azure OpenAI and Microsoft Foundry endpoints, and uses the actual token counts returned in the API response. The llm-token-limit policy is the generic variant for any LLM backend, including Mistral, Cohere, and others. Both share the same attribute model, so the configuration patterns below apply to either.

Choosing a counter-key for Azure API Management Token Limiting

The counter-key attribute is the most important decision in configuring the token limit policy. It determines the scope of the limit: who shares a TPM bucket, and who gets their own.

Azure API Management token limit policy counter-key strategies per subscription IP address and JWT claim with TPM sizing table — Diagram 2: Three counter-key strategies and TPM sizing guidance by workload type. The right scope depends on whether you are separating teams, protecting a public endpoint, or enforcing per-user limits in a multi-tenant application.

The three main strategies are:

Per subscription: @(context.Subscription.Id). This is the most common pattern for internal enterprise use. Each API product subscription gets its own TPM counter, which maps cleanly to a team, a product, or a cost center. Combined with the Token Metric policy covered in Part 4, this provides per-subscriber cost visibility and enforcement in a single configuration.

Per IP address: @(context.Request.IpAddress). Better suited to public-facing endpoints or developer portals where you don’t have a subscription model. It’s a blunt instrument — NAT and shared egress can mean multiple users share a counter — but it’s effective for abuse prevention and trial access scenarios.

Per JWT claim or custom header: @(context.Request.Headers.GetValueOrDefault(“x-user-id”,””)). The most flexible option. If your application passes a user identifier in a header or JWT claim, you can scope limits to the individual user. This is the right approach for multi-tenant applications where each end user should have their own token budget, independent of which subscription they’re calling through.

Sizing Your TPM Limits

TPM limits are context-dependent, but a few principles apply across most workloads.

Start by profiling your actual token usage in a staging environment before setting production limits. The remaining-tokens-variable-name attribute exposes the remaining token budget as a policy variable, which you can log via the Token Metric policy to build a usage baseline before enforcing hard limits.

For the estimate-prompt-tokens attribute: set it to false in production. When set to true, APIM estimates prompt tokens before the response is returned, enabling earlier throttling but reducing accuracy. In practice, counting actual tokens from the response is more reliable and avoids throttling requests that would have been within budget.

A common mistake is setting a single global TPM limit too low, which throttles all consumers the moment a batch job runs on any team. The better pattern is tiered limits by API product: a Developer product with a low TPM ceiling, a Standard product for normal workloads, and an Unlimited product for production pipelines that need burst capacity.

Handling 429 Responses in Calling Applications

Any application calling an APIM-fronted AI endpoint needs to handle 429 responses gracefully. APIM returns a Retry-After header indicating how many seconds until the token window resets. Well-behaved clients respect this header and back off rather than retrying immediately.

For agentic workloads with multiple pipeline steps, a 429 response midway through can leave the agent in an inconsistent state. The recommended pattern is to expose the remaining-tokens-variable-name value in a response header so the calling application can monitor its own budget and slow down proactively, rather than waiting for a hard rejection.

The Azure OpenAI token limit policy documentation covers the full attribute reference, including tokens-per-minute, counter-key, estimate-prompt-tokens, and remaining-tokens-variable-name. The llm-token-limit variant has the same interface for non-Azure backends.

What’s Next in This Azure API Management for AI Series

Part 4 covers the Token Metric policy: how to emit token usage data to Application Insights broken down by consumer dimensions, and how to use that data for internal cross-charging and spend dashboards.

Part 4: Token Metric policy — emitting usage data for observability and cross-charging.
Part 5: Load balancing and circuit breaking across PTU and PAYG backends.
Part 6: Semantic caching — reducing token consumption with similarity-based response reuse.
Part 7: APIM as an MCP gateway for agentic AI workloads.

Azure API Management for AI: Securing Your AI APIs with Authentication and Authorization

Posted on May 5, 2026 by steefjan1970

Part 2 of 7 in the “APIM for AI Workloads” series

In Part 1 of this series, I made the case for why Azure API Management for AI workloads is the right control plane for governing AI traffic across an organization. This post gets practical: how do you actually secure access to your AI backends with APIM without creating a credential-management nightmare?

Security is where many AI projects cut corners, and understandably so. When you’re moving fast to prove value with a new model, authentication feels like overhead. But AI endpoints are expensive, and an unsecured Azure OpenAI endpoint is a real risk: anyone with the URL and key can start consuming tokens at your cost. At scale, that’s a significant financial and compliance exposure.

APIM addresses this with a three-layer security model. Let’s walk through each layer.

Azure API Management for AI Security: A Three-Layer Model

The authentication and authorization pattern in APIM is deliberately layered. Each layer answers a different question and operates independently, so a failure at any layer stops the request before it reaches the AI backend.

Azure API Management for AI three-layer authentication flow showing subscription key, JWT validation and Managed Identity policy pipeline — *Diagram 1: Three-layer auth in APIM for AI workloads.* Layer 1 identifies the caller via subscription key. JWT validation in Layer 2 then determines what they’re permitted to do. Finally, Layer 3 authenticates APIM itself to the AI backend via Managed Identity.

The three layers are:

Subscription keys to identify and track API consumers.
JWT validation to enforce fine-grained access control based on claims.
Managed Identity to authenticate APIM to Azure OpenAI without storing credentials.

Each layer has a distinct role. Confusing them is a common mistake, so it’s worth being explicit about what each one does and does not do.

Layer 1: Subscription Keys

Subscription keys are APIM’s mechanism for identifying API consumers. When you create an API product in APIM and require a subscription, callers must include their key in the Ocp-Apim-Subscription-Key header. APIM validates the key, maps it to a subscriber, and lets the request proceed.

This is important for AI workloads specifically because subscription keys enable per-consumer token tracking. When you combine subscription key validation with the Token Metric policy we’ll cover in Part 4, you get usage data broken down by subscriber, which is the foundation of any internal cross-charging model.

Subscription keys answer the question: Who is calling? They don’t answer what the caller is allowed to do. For that, you need JWT validation.

Layer 2: JWT Validation and Claims-Based Authorization

The validate-jwt policy is where you enforce what a caller is permitted to do. It validates the JWT token in the Authorization header against your identity provider, and can inspect any claim in the token to make authorization decisions.

For Azure OpenAI specifically, this is where you control which teams or applications can access which model deployments. A team working on an internal chatbot should not be able to call a GPT-4o deployment reserved for a production workload. JWT claims let you enforce that boundary at the gateway layer, with no changes required in the calling application.

A typical policy checks the token signature against your Azure AD tenant’s OpenID Connect configuration, then validates that a required scope or role claim is present:

The failed-validation-httpcode=”401″ attribute ensures unauthenticated callers get a clean rejection before they ever reach the backend. You can also use failed-validation-error-message to return a specific error message, which helps consumers debug auth failures without exposing internal details.

For multi-provider setups where you’re routing to non-Azure backends like Mistral or Cohere, the same JWT policy applies. The claims model is provider-agnostic, which is one of the advantages of centralizing auth in APIM rather than handling it per-backend.

Layer 3: Managed Identity for Backend Authentication

Managed Identity is the most important security improvement you can make when setting up Azure API Management for AI. It replaces the pattern of storing an Azure OpenAI API key in APIM’s named values with a system-assigned or user-assigned Managed Identity that APIM uses to authenticate directly to Azure OpenAI via Azure AD.

Azure API Management for AI comparing API key authentication risks versus Managed Identity benefits for Azure OpenAI backend access — *Diagram 2: API key authentication (left) vs. Managed Identity (right). The key difference is that Managed Identity requires no stored credentials anywhere in your configuration.*

The practical difference is significant. With API key authentication, you have a long-lived secret that needs to be stored, rotated, and kept out of source control. With Managed Identity, there is no secret. APIM requests a short-lived token from Azure AD at runtime, and Azure AD issues it based on the APIM instance’s identity. Nothing is stored. Nothing can leak.

The configuration is a single policy element in the inbound section: <authentication-managed-identity resource=”https://cognitiveservices.azure.com”/>. APIM handles the rest, automatically fetching and refreshing the token.

On the Azure OpenAI side, you grant the APIM instance’s Managed Identity the Cognitive Services User role on the Azure OpenAI resource. That’s the minimum required permission. You can scope it further to specific deployments if needed.

For organizations in regulated industries, such as healthcare, financial services, and government, Managed Identity is not optional. It satisfies Zero Trust authentication requirements and produces a full audit trail in Azure Monitor, tied to the APIM instance identity rather than a shared key.

Azure API Management for AI: Putting the Three Layers Together

In a production setup, all three layers run sequentially within the inbound policy pipeline. A request arrives with a subscription key and a JWT. APIM validates the key first (fast, no external call), then validates the JWT against Azure AD, then forwards the request to Azure OpenAI using its Managed Identity token. The AI backend never sees the caller’s JWT, and APIM never stores an API key.

The result is a clean separation of concerns:

The calling application manages its own JWT (issued by Azure AD based on its own identity or the user’s identity).
APIM enforces the authorization policy without the backend needing to know anything about it.
The AI backend trusts only APIM’s Managed Identity, not arbitrary callers.

This is the architecture you want before you go to production with any AI workload that touches sensitive data or incurs meaningful cost.

What’s Next in This Series

Part 3 covers the Token Limit policy: how to enforce tokens-per-minute limits per consumer, configure throttling behavior, and handle the differences between the azure-openai-token-limit and llm-token-limit policy variants.

Part 3: Token Limit policy — enforcing tokens-per-minute limits per consumer.
Part 4: Token Metric policy — emitting usage data for observability and cross-charging.
Part 5: Load balancing and circuit breaking across PTU and PAYG backends.
Part 6: Semantic caching — reducing token consumption with similarity-based response reuse.
Part 7: APIM as an MCP gateway for agentic AI workloads.

Build an AI Tech News Aggregator: Azure Functions & Claude

Posted on March 25, 2026 by steefjan1970

There’s a lot of noise on the internet. Reddit, Hacker News, tech blogs, keeping up with what actually matters in enterprise software is a full-time job. So I built a fully automated system that does it for me, runs in the cloud, is powered by AI, and was deployed end-to-end in less than two hours using Claude Code.

Here’s how.

What We Built (What Claude did mostly)

A C# Azure Function that runs every hour and:

Fetches posts from configurable Reddit subreddits and Hacker News
Filters for recency only posts from the last 7 days
Deduplicates across runs never evaluates the same URL twice
Applies an AI editorial filter Claude decides what’s genuinely newsworthy
Writes curated results to Azure Blob Storage as timestamped JSON

The output is clean, structured JSON ready to feed into a newsletter, dashboard, or notification system.

The Architecture

The system has three layers: data collection, AI filtering, and persistence.

Reddit RSS feeds ──┐

├─► Aggregator Function ─► Claude AI Filter ─► Blob Storage

HN Firebase API ───┘ │

└─► State Store (seen URLs)

Tech Stack

Concern	Choice
Runtime	Azure Functions v4, .NET 8 isolated worker
Reddit data	Public Atom/RSS feed (r/{sub}/top.rss)
HN data	Firebase REST API
AI filtering	Anthropic Claude (claude-opus-4-6) via raw HttpClient
Storage	Azure Blob Storage
Schedule	NCRONTAB timer trigger

Interesting Engineering Decisions

Reddit: RSS over JSON API

The Reddit JSON API (/top.json) started returning 403s without authentication. Rather than deal with OAuth, we switched to Reddit’s public Atom/RSS feed (no credentials required) and parsed it with System.Xml.Linq in a handful of lines. Simple wins.

Claude as an Editorial Filter

Instead of writing brittle keyword heuristics to judge whether a post is “real tech news,” we hand that job to Claude with a carefully crafted system prompt based on Editorial Guidelines:

A post qualifies if it is relevant to enterprise software development AND meets at least one of the following: Change, Innovation, or Emergent Ideas, and is not a minor patch release, pure marketing, or clickbait.

Claude receives posts in batches of 25, returns a JSON array of qualifying indices, and we map those back to posts. If the API is unreachable, the batch passes through unfiltered as a deliberate fail-safe so the pipeline never breaks.

We used structured JSON output (output_config.format.type = “json_schema”) to guarantee a parseable response every time, no regex needed.

Deduplication Without a Database

To prevent re-evaluating the same URLs across hourly runs (and paying for unnecessary AI API calls), we persist a rolling state file — state/seen-urls.json — in Blob Storage. On each run:

Load seen URLs into a HashSet<string> for O(1) lookup
Filter new posts against it
After filtering, mark all new posts as seen (not just the ones that passed the AI filter — rejected posts shouldn’t be retried)
Prune entries older than 7 days to keep the file small

No database, no Redis, no infrastructure overhead. A blob file is enough.

The AI Filter in Practice

A typical hourly run might look like this:

Fetched 312 posts from the last 7 days.

Deduplication: 47 new / 265 already seen (skipped).

Running news quality filter on 47 new posts…

News filter: 11/25 posts passed.

News filter: 9/22 posts passed.

Filter complete: 20/47 posts kept.

20 posts saved to 2026/03/24/09-00-01.json

Out of 312 raw posts, 20 make it through. That’s the kind of signal-to-noise ratio that makes a curated feed actually worth reading.

Deployment

The whole thing deploys with two commands:

# Push app settings (API keys, schedule, etc.)

az functionapp config appsettings set \

–name FuncNewsAggregation \

–resource-group rg-news-aggregators \

–settings @appsettings.json

# Publish the function

func azure functionapp publish FuncNewsAggregation –dotnet-isolated

Done. The function is live, running on Azure’s infrastructure, costing pennies per day.

What’s Next

A few natural extensions:

Email or Slack digest — trigger a Logic App when a new blob is written
Web frontend — serve the JSON blobs as a read-only news feed
Scoring — weight HN scores more heavily now that RSS drops Reddit scores
More sources — dev.to, lobste.rs, or custom RSS feeds are easy to add

Takeaways

The most interesting lesson here isn’t the code, it’s the division of labor. Deterministic logic handles the mechanical work: fetching, deduplicating, and scheduling. The judgment call “Is this actually news?” goes to the model.

That separation keeps the system simple, cheap to run, and easy to adjust. Change the system prompt, and you change the editorial policy. No retraining, no feature engineering.

Two hours from idea to deployed function. That’s the pace at which you can build now.

All source code is C# targeting .NET 8. The function runs on an Azure Consumption plan and incurs roughly $0 in hourly costs well within the free tier.

AI Is Reshaping Software Development — At What Cost?

Posted on February 28, 2026 by steefjan1970

February has been a busy month for me at InfoQ. I wrote three articles that, on the surface, cover different topics: skill formation, open-source sustainability, and Agile methodology. But when I stepped back and looked at them together, a pattern jumped out at me. Each one tells a piece of the same story: AI is transforming how we build software at a pace that exceeds our ability to think about the consequences.

I want to use this post to connect the dots.

AI Software Development Is Eroding Developer Skills

The first piece I wrote covered an Anthropic study on how AI coding assistance affects skill development. The research was a randomized controlled trial with 52 junior engineers learning a Python library called Trio, which none of them had used before. The findings were stark. Developers who used AI assistance scored 17 percent lower on comprehension tests compared to those who coded by hand. That gap is roughly equivalent to two letter grades.

What struck me most wasn’t the headline number, though. It was the nuance underneath. Participants who used AI as a thinking partner, asking conceptual questions, requesting explanations, and working through problems alongside the tool, retained far more knowledge than those who asked the AI to generate code for them. The dividing line sat around a 65 percent score threshold. Above it, you found the curious developers. Below it are the ones who had delegated the thinking.

I’ve been working in IT for a long time. I’ve seen junior engineers grow into senior architects, and the path always involved struggle. Debugging code you don’t understand at 11 PM on a Tuesday. Reading documentation that makes your eyes glaze over. Writing something that breaks, then figuring out why. That struggle is where the learning happens. What concerns me is not that AI exists; I use it daily and find it genuinely helpful, but that we might be removing the friction that develops competence in the first place.

The full article is here: Anthropic Study: AI Coding Assistance Reduces Developer Skill Mastery by 17%

AI Coding Tools Are Overwhelming Open Source Maintainers

My second article examined a problem I’ve been watching develop for months. Daniel Stenberg shut down cURL’s bug bounty after AI-generated submissions reached 20 percent of the total. Mitchell Hashimoto banned AI-generated code from Ghostty entirely. Steve Ruiz took it even further with tldraw, auto-closing all external pull requests. These aren’t fringe projects. cURL runs on billions of devices. These are maintainers reaching a breaking point.

RedMonk analyst Kate Holterhoff coined the term “AI Slopageddon” to capture what’s happening, and it does so well. The flood of AI-generated contributions looks plausible at first glance but falls apart on inspection. The problem isn’t just quality, it’s volume. Maintainers are human beings with limited time, and they’re now spending that time sifting through submissions that an AI produced in seconds without any real understanding of the project.

A research paper from the Central European University and the Kiel Institute for the World Economy modeled the bigger structural risk here. Open-source projects depend on user engagement, documentation views, bug reports, and community recognition as a return on the maintainer’s investment. When AI agents assemble packages without developers ever reading the docs or filing bugs, that feedback loop breaks. The researchers tried to model a “Spotify-style” revenue redistribution. Still, the numbers didn’t work: vibe-coded users would need to generate 84 percent of the engagement that direct users currently provide. That’s not realistic.

I keep thinking about this one. My entire career has been built on open source, from the tools I integrate at work to the libraries I rely on for InfoQ articles. If the ecosystem that produces and maintains these tools becomes unsustainable because AI-generated noise overwhelms the people doing the actual work, we all lose. Not eventually. Soon.

More details here: AI “Vibe Coding” Threatens Open Source as Maintainers Face Crisis.

AI Software Development Puts Agile Under Pressure

The third article I wrote covered a debate sparked by Steve Jones, an executive VP at Capgemini, who declared that AI has killed the Agile Manifesto. His argument: when agentic SDLC systems can build applications in hours, the Manifesto’s human-centric principles no longer apply. If the tooling matters as much as or more than the people using it, then the Manifesto’s preference for “individuals and interactions over processes and tools” breaks down.

It’s a provocative claim that generated a lot of discussion. Casey West proposed an “Agentic Manifesto” that shifts the focus from verification to validation. AWS’s 2026 prescriptive guidance suggests “Intent Design” should replace sprint planning. Kent Beck, one of the original Manifesto signatories, has been talking about “augmented coding” as a new paradigm.

But here’s the counterpoint that keeps sticking with me. Forrester’s 2025 State of Agile Development report found that 95 percent of professionals still consider Agile critically relevant to their work. That’s not a methodology on its deathbed. And as one commenter noted in the discussion thread, bureaucracy killed Agile long before AI agents came along.

I think the question isn’t whether the Agile Manifesto is obsolete. It’s whether we’ve ever fully lived by its principles in the first place. The Manifesto says “responding to change over following a plan.” If there’s ever been a moment that demands responsiveness and adaptation, it’s right now. The irony of declaring Agile dead precisely when we need its core philosophy the most isn’t lost on me.

Full article: Does AI Make the Agile Manifesto Obsolete?

What AI’s Impact on Software Development Really Tells Us

When I look at these three stories together, I see a common tension. AI is accelerating what we can measure, lines of code produced, pull requests submitted, and applications prototyped, while eroding what is harder to quantify. Deep understanding of a codebase. Thoughtful engagement with an open-source community. The human judgment that sits at the heart of iterative development.

The Anthropic study shows that speed and learning pull in opposite directions, at least for developers acquiring new skills. The open-source crisis tells us that volume and quality are diverging at an alarming rate. The Agile debate tells us that our existing frameworks for organizing human work are straining under the weight of AI-driven change.

None of this means we should reject AI tools. I certainly won’t. But I think we need to be far more intentional about how we deploy them. That means designing AI assistants that support learning rather than replace it. It means building platforms that protect maintainers from low-quality noise. It means evolving our methodologies rather than abandoning them.

As someone who has spent years exploring new technologies, it’s one of the things I enjoy most about working in this field. I remain optimistic about where AI can take us. But optimism without caution is just naivety. The choices we make in the next year or two about how AI integrates into our development practices will shape the industry for a decade.

We should probably pay attention.

AWS European Sovereign Cloud Launches—But Does It Solve the Real Problem?

Posted on February 6, 2026 by steefjan1970

Earlier, AWS officially launched its European Sovereign Cloud, backed by a €7.8 billion investment in Brandenburg, Germany. The infrastructure is physically and logically separated from AWS global regions, managed by a new German parent company (AWS European Sovereign Cloud GmbH), and staffed exclusively by EU residents. On paper, it checks every compliance box for data residency and operational sovereignty. AWS CEO Matt Garman called it “a big bet” for the company, and it is. The question is whether it’s the right bet for Europe.

European Sovereign Cloud: Real Isolation, Real Trade-offs

The technical separation is genuine. An AWS engineer who deployed services to the European Sovereign Cloud confirmed on Hacker News that proper boundaries exist—U.S.-based engineers can’t see anything happening in the sovereign cloud. To fix issues there, they play “telephone” with EU-based engineers. The infrastructure uses the partition name *aws-eusc* and the region name *eusc-de-east-1*, which are completely separate from AWS’s global regions. All components, IAM, billing systems, and Route 53 name servers using European Top-Level Domains—remain within EU borders.

But this isolation comes with costs. As that same engineer warned, “it really slows down debugging issues. Problems that would be fixed in a day or two can take a month.” This is the sovereignty trade-off in practice: more control, less velocity. The service launches with approximately 90 AWS services, not the full catalog. Plans exist to expand into sovereign Local Zones in Belgium, the Netherlands, and Portugal, but this remains a subset of AWS’s offerings globally.

For some workloads, this trade-off makes sense. For others, it’s a deal-breaker.

Why the European Sovereign Cloud Can’t Escape U.S. Jurisdiction

Here’s the uncomfortable truth that AWS’s marketing carefully sidesteps: technical isolation doesn’t create legal isolation. AWS, headquartered in America, remains subject to U.S. jurisdiction. The CLOUD Act allows U.S. authorities to compel U.S.-based technology companies to provide data, regardless of where it is stored globally. Courts can require parent companies to produce data held by subsidiaries.

This isn’t theoretical hand-wraving. Microsoft had to admit in a French court that it cannot guarantee data sovereignty for EU customers. When Airbus executive Catherine Jestin discussed AWS’s sovereignty claims with lawyers late last year, she said: “I still don’t understand how it is possible” for AWS to be immune to extraterritorial laws.

Cristina Caffarra, founder of the Eurostack Foundation and competition economist, puts it bluntly:

A company subject to the extraterritorial laws of the United States cannot be considered sovereign for Europe. That simply doesn’t work.

The AWS response focuses on technical controls—encryption, the Nitro System preventing employee access, and hardware security modules. These are important safeguards, but they don’t address the core legal issue. If a U.S. court orders Amazon.com Inc. to produce data, technical barriers become legal obstacles the parent company must overcome, not protections.

Europe’s European Sovereign Cloud Strategy: The Cloud and AI Development Act

AWS’s launch comes as Europe finalizes its own legislative response. The EU Cloud and AI Development Act, expected in Q1 2026, aims to strengthen Europe’s autonomy over cloud infrastructure and data. As Christoph Strnadl, CTO of Gaia-X, explains:

For critical data, you will never, ever use a US company. Sovereignty means having strategic options — not doing everything yourself.

The Act is part of the EU’s Competitiveness Compass and addresses a fundamental problem: Europe’s 90% dependency on non-EU cloud infrastructure, predominantly American companies. This dependency isn’t just about data residency—it’s about strategic autonomy. When essential services depend on infrastructure governed by foreign law, questions arise about jurisdiction, resilience, and what happens during geopolitical disruption.

Current estimates indicate that AWS, Microsoft Azure, and Google Cloud collectively control over 60% of the European cloud market. European providers account for only a small share of revenues. The Cloud and AI Development Act aims to establish minimum criteria for cloud services in Europe, mobilize public and private initiatives for AI infrastructure, and create a single EU-wide cloud policy for public administrations and procurement.

Importantly, Brussels isn’t seeking to ban non-EU providers. As Strnadl notes:

Sovereignty does not mean you have to do everything yourself. Sovereignty means that for critical things, you have strategic options.

Gaia-X and the European Sovereign Cloud: A Lesson in Sovereignty Washing

Europe has been down this path before. Gaia-X, launched in 2019, intended to create a trustworthy European data infrastructure. Then American companies lobbied to be included. Once Microsoft, Google, and AWS were inside, critics argue, Gaia-X lost its purpose. The fear now is that AWS’s European Sovereign Cloud represents sophisticated “sovereignty washing”—placing datacenters on European soil without resolving the fundamental legal issue.

Recent European actions suggest growing awareness of this problem. Austria, Germany, France, and the International Criminal Court in The Hague are taking concrete steps toward genuine digital independence. These aren’t just policy statements—they’re actual migrations away from U.S. hyperscalers toward European alternatives.

European Sovereign Cloud Adoption: No Full Migration in 2026

Forrester predicts that no European enterprise will fully shift away from U.S. hyperscalers in 2026, citing geopolitical tensions, volatility, and new legislation, such as the EU AI Act, as barriers. The scale of dependency is too deep, the feature gap too wide, and the migration costs too high for rapid change.

Gartner forecasts European IT spending will grow 11% in 2026 to $1.4 trillion, with 61% of European CIOs and tech leaders wanting to increase their use of local cloud providers. Around half (53%) said geopolitical factors would limit their use of global providers in the future. The direction is clear, even if the pace remains uncertain.

This creates a transitional period where organizations must make pragmatic choices. For non-critical workloads, AWS’s European Sovereign Cloud may be sufficient. For truly sensitive data—government communications, defense systems, critical infrastructure—organizations need genuinely European alternatives: Hetzner, Scaleway, OVHCloud, StackIT by Schwarz Digits.

What AWS’s European Sovereign Cloud Actually Delivers

Let’s be precise about what AWS European Sovereign Cloud achieves. It provides:

Data residency within the EU
Operational control by EU residents
Governance through EU-based legal entities
Technical isolation from the global AWS infrastructure
An advisory board of EU citizens with independent oversight

What it doesn’t provide is independence from U.S. legal jurisdiction. For compliance requirements focused purely on data residency and operational transparency, this may be sufficient. For organizations requiring protection from U.S. government data requests, it fundamentally isn’t.

As Eric Swanson from CarMax noted in a LinkedIn post:

Sovereign cloud offerings do not override the Patriot Act. They mainly reduce overlap across other contexts: data location, operational control, employee access, and customer jurisdiction.

European Sovereign Cloud and Strategic Autonomy: Not Autarky

Europe’s path forward isn’t about digital isolationism. As Strnadl emphasizes, technology adoption that involves a paradigm shift doesn’t happen in two years. The challenge is adoption, not frameworks. “Cooperation needs trust,” he says, “and trust needs a trust framework.”

The Cloud and AI Development Act, expected this quarter, will provide that framework. It will set minimum criteria, promote interoperability, and establish procurement rules that favor sovereignty for critical workloads. The question for organizations is: what constitutes critical?

For email, public administration, political communication, and defense systems, the answer should be obvious. These require European alternatives. For other workloads, AWS’s European Sovereign Cloud may strike an acceptable balance between capability and control.

The Bottom Line

AWS’s €7.8 billion investment is real. The technical isolation is real. The economic contribution to Germany’s GDP (€17.2 billion over 20 years) is real. What’s also real is that Amazon.com Inc., a U.S. company, ultimately controls this infrastructure and remains subject to U.S. law.

For organizations seeking compliance checkboxes and data residency guarantees, AWS European Sovereign Cloud delivers. For organizations requiring genuine independence from U.S. legal jurisdiction, it remains fundamentally insufficient. That’s not a criticism of AWS’s engineering—it’s a statement of legal reality.

The sovereignty question Europe faces isn’t technical. It’s strategic: do we accept managed dependency or build genuine autonomy? AWS offers the former. Only European alternatives can provide the latter.

The market will decide which answer matters more.

Agentic Orchestration: The Evolution of SOA

Posted on January 25, 2026 by steefjan1970

For decades, integration professionals have shaped the digital backbone of enterprises from EAI to SOA to microservices. Today, agentic orchestration marks the next step in that evolution: transforming how we compose, coordinate, and reason across enterprise services. This isn’t a replacement for what we know; it’s an intelligent upgrade to it.

We built the bridges, the highways, and the intricate railway networks of the digital world. Yet, let’s be honest—for all our sophistication, our orchestrations often felt like a meticulous, rigid dance.

Enter Agentic Orchestration. This isn’t just another buzzword. It’s a profound shift, an evolution that takes the core principles of SOA and infuses them with intelligence, dynamism, and a remarkable degree of autonomy. For the seasoned integration architect and engineer, this isn’t about replacing what we know—it’s about enhancing it, elevating it to a new plane of capability.

How SOA Composites Differ from Agentic Orchestration

Cast your mind back to the golden age of SOA. For those of us in the Microsoft ecosystem, this meant nearly two and a half decades with BizTalk Server as our workhorse, our battleground, our canvas. We diligently crafted composite services using orchestration designers, adapters, and pipelines. Others wielded BPEL and ESBs, but the principle was the same. Our logic was clear, explicit, and, crucially, deterministic.

If a business process required validating a customer, then checking inventory, and finally processing an order, we laid out that sequence with unwavering precision—whether in BizTalk’s visual orchestration designer or in BPEL code:

XML

			
<bpel:sequence name="OrderFulfillmentProcess">
  <bpel:invoke operation="validateCustomer" partnerLink="CustomerService"/>
  <bpel:invoke operation="checkInventory" partnerLink="InventoryService"/>
  <bpel:invoke operation="processPayment" partnerLink="PaymentService"/>
</bpel:sequence>

		

Those of us who spent years with BizTalk know this dance intimately: the Receive shapes, the Decision shapes, the carefully constructed correlation sets, the Scope shapes wrapped around every potentially fragile operation. We debugged orchestrations at 2 AM, optimized dehydration points, and became masters of the Box-Line-Polygon visual language.

This approach delivered immense value. It brought order to chaos, reused services, and provided a clear, auditable trail. However, its strength was also its weakness: rigidity. Any deviation or unforeseen circumstance required a developer to step in, modify the orchestration, and redeploy. The system couldn’t “think” its way around a problem it merely executed a predefined script a well-choreographed ballet, beautiful but utterly inflexible to improvisation.

Agentic Orchestration: From Fixed Scripts to Intelligent Collaboration

Now, imagine an orchestration that doesn’t just execute a script, but reasons. An orchestration where the “participants” are not passive services waiting for an instruction, but intelligent agents equipped with goals, memory, and a suite of “tools”—which, for us, are often our existing services and APIs.

This is the essence of agentic orchestration. It shifts from a predefined, top-down command structure to a more collaborative, goal-driven paradigm. Instead of meticulously charting every step, we define the desired outcome and empower intelligent agents to find the best path to it.

Think of it as moving from a detailed project plan (SOA) to giving a highly skilled project manager (the Orchestrator Agent) a clear objective and a team of specialists (worker agents, each with specific skills/tools).

Key Differences that Matter

From Fixed Sequence to Dynamic Planning:

Traditional SOA executes a predetermined sequence: Step A, then Step B, then Step C. Agentic orchestration takes a different approach — agents dynamically construct their plan based on current context and available resources, asking: “What tools do I have, and which best serve this step?”

From Explicit Error Handling to Self-Correction:

In SOA, elaborate try-catch blocks covered every potential failure. BizTalk veterans will remember wrapping Scope shapes inside Scope shapes, each carrying its own exception handler. With agentic systems, a failing tool triggers reasoning rather than a halt — the agent may retry with a different tool, consult another agent, or revise its plan entirely.

From API Contracts to Intent-Based Communication:

Traditional SOA services communicate via strict, often verbose XML or JSON contracts — schema design and message transformation consumed countless engineering hours. Agentic systems shift to intent-based communication instead. An “Order Fulfillment Agent” can instruct a “Shipping Agent” with a clear goal: “Ship this package to customer X by date Y.” The Shipping Agent then determines which underlying tools, FedEx API, DHL API, best achieve that outcome, abstracting away the complexity of individual service calls.

From Static Connectors to Smart Tools:

Connectors and adapters in SOA are fixed pathways, each requiring explicit configuration per integration point. BizTalk veterans know this well from hours spent configuring adapters for every specific endpoint. In agentic architectures, existing APIs, databases, message queues, and even legacy systems are reframed as tools that agents can discover and wield intelligently. A Logic App connector to SAP is no longer just a connector; it becomes a capable SAP tool that an agent can invoke when the situation calls for it. The Model Context Protocol (MCP) is making this kind of dynamic tool discovery increasingly seamless.

A Concrete Example

Consider an order that fails the inventory check in our traditional BPEL or BizTalk orchestration. In SOA: hard stop, send error notification, await human intervention, and process redesign.

In an agentic system, the orchestrator agent might dynamically query alternate suppliers, adjust delivery timelines based on customer priority, suggest product substitutions, or even negotiate partial fulfillment—all without hardcoded logic for each scenario. The agent reasons about the business goal (fulfill the customer order) and uses available tools to achieve it, adapting to circumstances we never explicitly programmed for.

Azure Logic Apps: The Bridge to the Agentic Future

Azure Logic Apps demonstrates this evolution in practice, and it’s particularly compelling for integration professionals. For those of us coming from the BizTalk world, Logic Apps already felt familiar—the visual designer, the connectors, the enterprise reliability. Now, we’re not throwing away our decades of experience with these patterns. Instead, we’re adding an “intelligence layer” on top.

The Agent Loop within Logic Apps, with its “Think-Act-Reflect” cycle, transforms our familiar integration canvas into a dynamic decision-making engine. We can build multi-agent patterns—agent “handoffs” in which one agent completes a task and passes it to another, or “evaluator-optimizer” setups in which one agent generates a solution and another critiques and refines it.

All this, while leveraging the robust, enterprise-ready connectors we already depend on. Our existing investments in integration infrastructure don’t become obsolete; they become more powerful. The knowledge we gained from debugging BizTalk orchestrations, understanding message flows, and designing for reliability? All of that remains valuable. Microsoft is simply upgrading our toolkit.

Adopting Agentic Orchestration: The Path Forward for Integration Architects

For integration engineers and architects, this is not a threat but an immense opportunity. We are uniquely positioned to lead this charge. We understand the nuances of enterprise systems, the criticality of data integrity, and the challenges of connecting disparate technologies. Those of us who survived the BizTalk years are battle-tested, we know what real-world integration demands.

Agentic orchestration frees us from the burden of explicit, step-by-step programming for every conceivable scenario. It allows us to design systems that are more resilient, more adaptive, and ultimately, more intelligent. It enables us to build solutions that not only execute business processes but also actively contribute to achieving business outcomes.

Start small: Identify one rigid orchestration in your current architecture that would benefit from adaptive decision-making. Perhaps it’s an order-fulfillment process with too many exception handlers, or a customer-onboarding workflow that breaks when regional requirements change. That’s your first candidate for agentic enhancement.

Let’s cast aside the notion of purely deterministic choreography. Let us instead embrace the era of intelligent collaboration, where our meticulously crafted services become the powerful tools in the hands of autonomous, reasoning agents.

The evolution is here. It’s time to orchestrate a smarter future.

Europe’s Sovereignty Challenge: A Framework for Cloud Control

Posted on November 9, 2025 by steefjan1970

Europe’s sovereignty challenge has moved from political debate to concrete policy. With the EU’s new Cloud Sovereignty Framework now in place, the continent is redefining how it procures and governs cloud infrastructure, shifting from dependency on foreign providers to measurable, auditable control over its digital destiny.

Today, Europe and the Netherlands find themselves at a crucial junction, navigating the complex landscape of digital autonomy. The recent introduction of the EU’s new Cloud Sovereignty Framework is the clearest signal yet that the continent is ready to take back control of its digital destiny.

This isn’t just about setting principles; it’s about introducing a standardized, measurable scorecard that will fundamentally redefine cloud procurement.

Europe’s Sovereignty Challenge: Why Digital Independence Is Non-Negotiable

The digital revolution has brought immense benefits, yet it has also positioned Europe in a state of significant dependency. Approximately 80% of our digital infrastructure relies on foreign companies, primarily American cloud providers. This dependence is not merely a matter of convenience; it’s a profound strategic vulnerability.

The core threat stems from U.S. legislation such as the CLOUD Act, which grants American law enforcement the power to request data from U.S. cloud service providers, even if that data is stored abroad. Moreover, this directly clashes with Europe’s stringent privacy regulations (GDPR) and exposes critical European data to external legal and geopolitical risk.

As we’ve seen with incidents like the Microsoft-ICC blockade, foreign political pressures can impact essential digital services. The possibility of geopolitical shifts, such as a “Trump II” presidency, only amplifies this collective awareness: we cannot afford to depend on foreign legislation for our critical infrastructure. The risk is present, and we must build resilience against it.

The Sovereignty Scorecard: From Principles to SEAL Rankings

The new Cloud Sovereignty Framework is the EU’s proactive response. It shifts the discussion from abstract aspirations to concrete, auditable metrics by evaluating cloud services against eight Sovereignty Objectives (SOVs) that cover legal, strategic, supply chain, and technological aspects.

The result is a rigorous “scorecard.” A provider’s weighted score determines its SEAL ranking (from SEAL-0 to SEAL-4, with SEAL-4 indicating full digital sovereignty). Crucially, this ranking is intended to serve as the definitive minimum assurance factor in government and public sector cloud procurement tenders. The Commission wants to create a level playing field where providers must tangibly demonstrate their sovereignty strengths.

Hyperscalers vs. European Providers: The Cloud Sovereignty Challenge

The framework has accelerated a critical duality in the market: massive, centralized investments by US hyperscalers versus strategic, federated growth by European alternatives.

Hyperscalers Adapt: Deepening European Ties

Global providers are making sovereignty a mandatory architectural and legal prerequisite by localizing their operations and governance.

AWS explicitly responded by announcing its EU Sovereign Cloud unit. This service is structured to ensure data residency and operational autonomy within Europe, explicitly targeting the SOV-3 (Data & AI Sovereignty: The degree of control customers have over their data and AI models, including where data is processed) criteria through physically and logically separated infrastructure and governance.
Google Cloud has also made significant moves, approaching digital sovereignty across three distinct pillars:
- Data Sovereignty (focusing on control over data storage, processing, and access with features like the Data Boundary and External Key Management, EKM, where keys can be held outside Google Cloud’s infrastructure);
- Operational Sovereignty (ensuring local partner oversight, such as the partnership with T-Systems in Germany); and
- Software Sovereignty (providing tools to reduce lock-in and enable workload portability).To help organizations navigate these complex choices, Google introduced the Digital Sovereignty Explorer, an interactive online tool that clarifies terms, explains trade-offs, and guides European organizations in developing a tailored cloud strategy across these three domains. Furthermore, Google has developed highly specialized options, including Air-Gapped solutions for the defense and intelligence sectors, demonstrating a commitment to the highest levels of security and residency.
Microsoft has demonstrated a profound deepening of its commitment, outlining five comprehensive digital commitments designed to address sovereignty concerns:
- Massive Infrastructure Investment: Pledging a 40% increase in European datacenter capacity, doubling its footprint by 2027.
- Governance and Resilience: Instituting a “European cloud for Europe” overseen by a dedicated European board of directors (composed exclusively of European nationals) and backed by a “Digital Resilience Commitment” to contest any government order to suspend European operations legally.
- Data Control: Completing the EU Data Boundary project to ensure European customers can store and process core cloud service data within the EU/EFTA.

European Contenders Scale Up

Strategic, open-source European initiatives powerfully mirror this regulatory push:

Virt8ra Expands: The Virt8ra sovereign cloud, which positions itself as a significant European alternative, recently announced a substantial expansion of its federated infrastructure. The platform, coordinated by OpenNebula Systems, added six new cloud service providers, including OVHcloud and Scaleway, significantly broadening its reach and capacity across the continent.
IPCEI Funding: This initiative, leveraging the open-source OpenNebula technology, is part of the Important Project of Common European Interest (IPCEI) on Next Generation Cloud Infrastructure and Services, backed by over €3 billion in public and private funding. This is a clear indicator that the vision for a robust, distributed European cloud ecosystem is gaining significant traction.

Redefining European Cloud Sovereignty: Resilience Over Isolation

Industry experts emphasize that the framework embodies a more mature understanding of digital sovereignty. It’s not about isolation (autarky), but about resilience and governance.

Sovereignty is about how an organization is “resilient against specific scenarios.” True sovereignty, in this view, lies in the proven, auditable ability to govern your own digital estate. For developers, this means separating cloud-specific infrastructure code from core business logic to maximize portability, allowing the use of necessary hyper-scale features while preserving architectural flexibility.

The Challenge: Balancing Features with Control

Despite the massive investments and public commitments from all major players, the framework faces two key hurdles:

The Feature Gap: European providers often lack the “huge software suite” and “deep feature integration” of US hyperscalers, which can slow down rapid development. Advanced analytics platforms, serverless computing, and tightly integrated security services often lack direct equivalents at smaller providers. This creates a complex chicken-and-egg problem: large enterprises won’t migrate to European providers because they lack features, but local providers struggle to develop those capabilities without enterprise revenue.
Skepticism and Compliance Complexity: Some analysts fear the framework’s complexity will inadvertently favor the global giants with larger compliance teams. Furthermore, deep-seated apprehension in the community remains, with some expressing the fundamental desire for purely European technological solutions: “I don’t want a Microsoft cloud or AI solutions in Europe. I want European ones.” Some experts suggest that European providers should focus on building something different by innovating with European privacy and control values baked in, rather than trying to catch up with US providers’ feature sets.

My perspective on this situation is that achieving true digital sovereignty for Europe is a complex and multifaceted endeavor. While the commitments from global hyperscalers are significant, the underlying desire for independent, European-led solutions remains strong. It’s about strategic autonomy, ensuring that we, as Europeans, maintain ultimate control over our digital destiny and critical data, irrespective of where the technology originates.

The race is now on. The challenge for the cloud industry is to translate the high-level, technical criteria of the SOVs into auditable, real-world reality to achieve that elusive top SEAL-4 ranking. The battle for the future of Europe’s cloud is officially underway.

Figma AWS Costs Explained: Beyond the Hype and Panic

Posted on July 16, 2025 by steefjan1970

Figma’s recent IPO filing revealed that its Figma AWS costs amount to roughly $300,000 per day, approximately $109 million annually, or 12% of its reported revenue of $821 million. The company is also committed to a minimum spend of $545 million with AWS over the next five years. Cue the online meltdown. “Figma is doomed!” “Fire the CTO!” The internet, in its infinite wisdom, declared. I wrote a news item on it for InfoQ and thought, let’s put things into perspective.

(Source: Figma.com)

But let’s inject a dose of reality, shall we? As Corey Quinn from The Duckbill Group, who probably sees more AWS invoices than you’ve seen Marvel movies, rightly points out, this kind of spending for a company like Figma is boringly normal.

As Quinn extensively details in his blog post, Figma isn’t running a simple blog. It’s a compute-intensive, real-time collaborative platform serving 13 million monthly active users and 450,000 paying customers. It renders complex designs with sub-100ms latency. This isn’t just about spinning up a few virtual machines; it’s about providing a seamless, high-performance experience on a global scale.

The Numbers Game: What the Armchair Experts Missed About Figma AWS Costs

The initial panic conveniently ignored a few crucial realities, according to Quinn:

Ramping Spend: Most large AWS contracts increase year-over-year. A $109 million annual average over five years likely starts lower (e.g., $80 million) and gradually increases to a higher figure (e.g., $150 million in year five) as the company expands.
Post-Discount Figures: These spend targets are post-discount. At Figma’s scale, they’re likely getting a significant discount (think 30% effective discount) on their cloud spend. So, their “retail” spend would be closer to $785 million over five years, not $545 million.

When you factor these in, Figma AWS costs fall squarely within industry benchmarks for its type of business:

Compute-lite SaaS: around 5% of revenue
Compute-heavy platforms (like Figma): 10–15% of revenue
AI/ML-intensive companies: often exceeding 15%

At 12% of revenue, Figma’s AWS costs are exactly where you’d expect them to be for a platform delivering real-time collaborative experiences at a global scale.

Furthermore, the increasing adoption of AI and Machine Learning in application development is introducing a new dimension to cloud costs. AI workloads, particularly for training and continuous inference, are incredibly resource-intensive, pushing the boundaries of compute, storage, and specialized hardware (like GPUs), which naturally translates to higher cloud bills. This makes effective FinOps and cost optimization strategies even more crucial for companies that leverage AI at scale.

So, while the internet was busy getting its math wrong and forecasting doom, Figma was operating within a completely reasonable range for its business model and scale.

The “Risky Dependency” Non-Story

Another popular narrative was the “risky dependency” on AWS. Figma’s S-1 filing includes standard boilerplate language about vendor dependencies, a common feature found in virtually every cloud-dependent company’s SEC filings. It’s the legal equivalent of saying, “If the sky falls, our business might be affected.”

Breaking news: a SaaS company that uses a cloud provider might be affected by outages. In related news, restaurants depend on food suppliers. This isn’t groundbreaking insight; it’s just common business risk disclosure. Figma’s “deep entanglement” with AWS, as described by Hacker News commenter nevon, illustrates the complexity of modern cloud architectures. Every aspect, from permissions to disaster recovery, is seamlessly integrated. That makes a quick migration akin to open-heart surgery. Not something you do on a whim.

Cloud Repatriation: A Valid Strategy, But Not a Universal Panacea

Figma’s costs reignited the cloud repatriation debate. The most vocal advocate is 37signals CTO David Heinemeier Hansson, who famously exited the cloud to save millions. And he’s not wrong for some companies; repatriating workloads delivers significant savings. But it’s not a one-size-fits-all solution.

Every company’s needs are different. Scrimba, for example, runs on dedicated servers and spends less than 1% of revenue on infrastructure. For them, repatriation is a perfect fit. Figma is a different story. Its real-time collaborative demands and massive user base require agility, scalability, and managed services at a global scale. A hyperscale provider like AWS isn’t optional; it’s central to the business model.

This brings us to a broader conversation, especially relevant in Europe: digital sovereignty. As I’ve discussed in my blog post, “Digital Destiny: Navigating Europe’s Sovereignty Challenge,” deep integration with a single hyperscaler isn’t just a cost question. It also affects the control an organization retains over its data and operations. Vendor lock-in carries real strategic implications. Data governance, regulatory compliance, and negotiating power can all be compromised. The extraterritorial reach of foreign laws adds another layer of concern. Many organizations are responding by exploring multi-cloud strategies or hybrid models. The goal: mitigate risk and assert greater control over their digital destiny.

My Cloud Anecdote: Costs vs. Value

This whole debate reminds me of a scenario I encountered back in 2017. I was working on a proof of concept for a customer, building a future-proof knowledge base using Cosmos DB, the Graph Model, and Search. The operating cost, primarily driven by Cosmos DB, was approximately 1,000 euros per month. Some developers immediately flagged it as “too expensive,” as I can recall, or even thought I was selling Cosmos DB. The reception, however, wasn’t universally positive. In fact, one attendee later wrote in their blog:

The most uninteresting talk of the day came from Steef-Jan Wiggers, who, in my opinion, delivered an hour-long marketing pitch for CosmosDB. I think it’s expensive for what it currently offers, and many developers could architect something with just as much performance without needing CosmosDB.

However, the proposed solution was for a knowledge base that customers could leverage via a subscription model. The crucial point was that the costs were negligible compared to the potential revenue the subscription model would net for the customer. It was an investment in a revenue-generating asset, not just a pure expense.

The Bottom Line: Putting Figma AWS Costs in Perspective

Thanks to Quinn, I understand that Figma is actively optimizing its infrastructure, transitioning from Ruby to C++ pipelines, migrating workloads, and implementing dynamic cluster scaling. He concluded:

They’re doing the work. More importantly, they’re growing at 46% year-over-year with a 91% gross margin. If you’re losing sleep over their AWS bill while they’re printing money like this, you might need to reconsider your priorities.

The “innovation <-> optimization continuum” is always at play. Companies often prioritize rapid innovation and speed to market, leveraging the cloud for its agility and flexibility. As they scale, they can then focus on optimizing those costs, and Figma AWS costs are no exception to that pattern.

This increasing complexity underscores the growing importance of FinOps (Cloud Financial Operations), a cultural practice that brings financial accountability to the variable-spend model of cloud computing, empowering teams to make data-driven decisions about cloud usage and optimize costs without sacrificing innovation.

Figma’s transparency in disclosing its cloud costs is actually a good thing. It forces a much-needed conversation about the true cost of running enterprise-scale infrastructure in 2025. The hyperbolic reactions, however, expose a fundamental misunderstanding of these realities. Which I also encountered with my Cosmos DB project in 2017.

So, the next time someone tells you that a company spending 12% of its revenue on infrastructure that literally runs its entire business is “doomed,” perhaps ask them how much they think it should cost to serve real-time collaborative experiences to 13 million users across the globe. When you understand what drives Figma AWS costs, the answer might surprise you.

Lastly, as the cloud landscape continues to evolve, with new services, AI integration, and shifting geopolitical considerations, the core lesson remains: smart cloud investment isn’t about avoiding the bill, but understanding its true value in driving business outcomes and strategic advantage. The dialogue about cloud costs is far from over, but it’s time we grounded it in reality.