Digital Destiny: Navigating Europe’s Sovereignty Challenge – A Framework for Control

With the geopolitical changes since Trump took office, I’ve been following developments in digital sovereignty and have seen the industry’s response to Europe’s strategic demands through various InfoQ news items.

Today, Europe and the Netherlands find themselves at a crucial junction, navigating the complex landscape of digital autonomy. The recent introduction of the EU’s new Cloud Sovereignty Framework is the clearest signal yet that the continent is ready to take back control of its digital destiny.

This isn’t just about setting principles; it’s about introducing a standardized, measurable scorecard that will fundamentally redefine cloud procurement.

The Digital Predicament: Why Sovereignty is Non-Negotiable

The digital revolution has brought immense benefits, yet it has also positioned Europe in a state of significant dependency. Approximately 80% of our digital infrastructure relies on foreign companies, primarily American cloud providers. This dependence is not merely a matter of convenience; it’s a profound strategic vulnerability.

The core threat stems from U.S. legislation such as the CLOUD Act, which grants American law enforcement the power to request data from U.S. cloud service providers, even if that data is stored abroad. Moreover, this directly clashes with Europe’s stringent privacy regulations (GDPR) and exposes critical European data to external legal and geopolitical risk.

As we’ve seen with incidents like the Microsoft-ICC blockade, foreign political pressures can impact essential digital services. The possibility of geopolitical shifts, such as a “Trump II” presidency, only amplifies this collective awareness: we cannot afford to depend on foreign legislation for our critical infrastructure. The risk is present, and we must build resilience against it.

The Sovereignty Scorecard: From Principles to SEAL Rankings

The new Cloud Sovereignty Framework is the EU’s proactive response. It shifts the discussion from abstract aspirations to concrete, auditable metrics by evaluating cloud services against eight Sovereignty Objectives (SOVs) that cover legal, strategic, supply chain, and technological aspects.

The result is a rigorous “scorecard.” A provider’s weighted score determines its SEAL ranking (from SEAL-0 to SEAL-4, with SEAL-4 indicating full digital sovereignty). Crucially, this ranking is intended to serve as the definitive minimum assurance factor in government and public sector cloud procurement tenders. The Commission wants to create a level playing field where providers must tangibly demonstrate their sovereignty strengths.

The Duel for Dominance: Hyperscalers vs. European Federation

The framework has accelerated a critical duality in the market: massive, centralized investments by US hyperscalers versus strategic, federated growth by European alternatives.

Hyperscalers Adapt: Deepening European Ties

Global providers are making sovereignty a mandatory architectural and legal prerequisite by localizing their operations and governance.

  • AWS explicitly responded by announcing its EU Sovereign Cloud unit. This service is structured to ensure data residency and operational autonomy within Europe, explicitly targeting the SOV-3 (Data & AI Sovereignty: The degree of control customers have over their data and AI models, including where data is processed) criteria through physically and logically separated infrastructure and governance.
  • Google Cloud has also made significant moves, approaching digital sovereignty across three distinct pillars:
    • Data Sovereignty (focusing on control over data storage, processing, and access with features like the Data Boundary and External Key Management, EKM, where keys can be held outside Google Cloud’s infrastructure);
    • Operational Sovereignty (ensuring local partner oversight, such as the partnership with T-Systems in Germany); and
    • Software Sovereignty (providing tools to reduce lock-in and enable workload portability).To help organizations navigate these complex choices, Google introduced the Digital Sovereignty Explorer, an interactive online tool that clarifies terms, explains trade-offs, and guides European organizations in developing a tailored cloud strategy across these three domains. Furthermore, Google has developed highly specialized options, including Air-Gapped solutions for the defense and intelligence sectors, demonstrating a commitment to the highest levels of security and residency.
  • Microsoft has demonstrated a profound deepening of its commitment, outlining five comprehensive digital commitments designed to address sovereignty concerns:
    • Massive Infrastructure Investment: Pledging a 40% increase in European datacenter capacity, doubling its footprint by 2027.
    • Governance and Resilience: Instituting a “European cloud for Europe” overseen by a dedicated European board of directors (composed exclusively of European nationals) and backed by a “Digital Resilience Commitment” to contest any government order to suspend European operations legally.
    • Data Control: Completing the EU Data Boundary project to ensure European customers can store and process core cloud service data within the EU/EFTA.

European Contenders Scale Up

Strategic, open-source European initiatives powerfully mirror this regulatory push:

  • Virt8ra Expands: The Virt8ra sovereign cloud, which positions itself as a significant European alternative, recently announced a substantial expansion of its federated infrastructure. The platform, coordinated by OpenNebula Systems, added six new cloud service providers, including OVHcloud and Scaleway, significantly broadening its reach and capacity across the continent.
  • IPCEI Funding: This initiative, leveraging the open-source OpenNebula technology, is part of the Important Project of Common European Interest (IPCEI) on Next Generation Cloud Infrastructure and Services, backed by over €3 billion in public and private funding. This is a clear indicator that the vision for a robust, distributed European cloud ecosystem is gaining significant traction.

Sovereignty Redefined: Resilience and Governance

Industry experts emphasize that the framework embodies a more mature understanding of digital sovereignty. It’s not about isolation (autarky), but about resilience and governance.

Sovereignty is about how an organization is “resilient against specific scenarios.” True sovereignty, in this view, lies in the proven, auditable ability to govern your own digital estate. For developers, this means separating cloud-specific infrastructure code from core business logic to maximize portability, allowing the use of necessary hyper-scale features while preserving architectural flexibility.

The Challenge: Balancing Features with Control

Despite the massive investments and public commitments from all major players, the framework faces two key hurdles:

  • The Feature Gap: European providers often lack the “huge software suite” and “deep feature integration” of US hyperscalers, which can slow down rapid development. Advanced analytics platforms, serverless computing, and tightly integrated security services often lack direct equivalents at smaller providers. This creates a complex chicken-and-egg problem: large enterprises won’t migrate to European providers because they lack features, but local providers struggle to develop those capabilities without enterprise revenue.
  • Skepticism and Compliance Complexity: Some analysts fear the framework’s complexity will inadvertently favor the global giants with larger compliance teams. Furthermore, deep-seated apprehension in the community remains, with some expressing the fundamental desire for purely European technological solutions: “I don’t want a Microsoft cloud or AI solutions in Europe. I want European ones.” Some experts suggest that European providers should focus on building something different by innovating with European privacy and control values baked in, rather than trying to catch up with US providers’ feature sets.

My perspective on this situation is that achieving true digital sovereignty for Europe is a complex and multifaceted endeavor. While the commitments from global hyperscalers are significant, the underlying desire for independent, European-led solutions remains strong. It’s about strategic autonomy, ensuring that we, as Europeans, maintain ultimate control over our digital destiny and critical data, irrespective of where the technology originates.

The race is now on. The challenge for the cloud industry is to translate the high-level, technical criteria of the SOVs into auditable, real-world reality to achieve that elusive top SEAL-4 ranking. The battle for the future of Europe’s cloud is officially underway.

AWS Shifts to a Credit-Based Free Plan, Aligning with Azure and GCP

AWS is officially moving away from its long-standing 12-month free tier for new accounts. The new standard, called the Free Account Plan, is a credit-based model designed to eliminate the risk of unexpected bills for new users.

With this new plan, you get:

  • A risk-free environment for experimenting and building proofs of concept for up to six months.
  • A starting credit of $100, with the potential to earn another $100 by completing specific exploration activities, such as launching an EC2 instance. This means you can get up to $200 in credits to use across eligible services.
  • The plan ends after six months or once your credits are entirely spent, whichever comes first. After that, you have a 90-day window to upgrade to a paid plan and restore access to your account and data.

This shift, as Principal Developer Advocate Channy Yun explains, allows new users to get hands-on experience without cost commitments. However, it’s worth noting that some services typically used by large enterprises won’t be available on this free plan.

While some may see this as a step back, I tend to agree with Corey Quinn’s perspective. He writes that this is “a return to product-led growth rather than focusing on enterprise revenue to the exclusion of all else.” Let’s face it: big companies aren’t concerned with the free tier. But for students and hobbyists, who can be seen as the next generation of cloud builders, a credit-based, risk-free sandbox is a much more attractive proposition. The new notifications for credit usage and expiration dates are a smart addition that provides peace of mind.

How the New Plan Compares to Other Hyperscalers

A helpful plan for those who like to experiment on AWS, I think. Yet, other hyperscalers like Azure and GCP offer similar plans too. Microsoft Azure and Google Cloud Platform (GCP) have long operated on credit-based models.

  • Azure offers a different model: $200 in credits for the first 30 days, supplemented by over 25 “always free” services and a selection of services available for free for 12 months.
  • GCP provides a 90-day, $300 Free Trial for new customers, which can be applied to most products, along with an “Always Free” tier that gives ongoing access to core services like Compute Engine and Cloud Storage up to specific monthly limits.

This alignment among the major cloud providers highlights a consensus on the best way to attract and onboard new developers.

Microsoft also offers $100 in Azure credits through Azure for students. Note that the MSDN credits are typically a monthly allowance tied to a specific Visual Studio subscription, and the student credits are a lump sum for a particular period (e.g., 12 months), as I believe these different models can be confusing.

Speaking of other cloud providers, my own experience with Azure is an excellent example of how these credit models can be beneficial. I enjoy credits for Azure because of my MVP benefits, and through MSDN subscriptions, one has a monthly $150 in credits. These are different options from the general one I mentioned earlier. Anyway, there are ways to access services provided by the three big hyperscalers that allow you to get hands-on experience in combination with their documentation and what you can find in public repos.

In general, when you like to learn more about Azure, AWS, or GCP, the following table shows the most straightforward options:

Cloud HyperscalerFree CreditsDocumentationRepo (samples)
AzureAzure Free AccountMicrosoft LearnAzure Samples · GitHub  
AWSAWS Free TierAWS DocumentationAWS Samples · GitHub
GCPGCP Free TrialGoogle Cloud DocumentationGoogle Cloud Platform · GitHub

Decoding Figma’s AWS Spend: Beyond the Hype and Panic

Figma’s recent IPO filing revealed a daily AWS expenditure of roughly $300,000, translating to approximately $109 million annually, or 12% of its reported revenue of $821 million. The company also committed to a minimum spend of $545 million over the next five years with AWS. Cue the online meltdown. “Figma is doomed!” “Fire the CTO!” The internet, in its infinite wisdom, declared. I wrote a news item on it for InfoQ and thought, ‘Let’s put things into perspective and add my own experience.’

(Source: Figma.com)

But let’s inject a dose of reality, shall we? As Corey Quinn from The Duckbill Group, who probably sees more AWS invoices than you’ve seen Marvel movies, rightly points out, this kind of spending for a company like Figma is boringly normal.

As Quinn extensively details in his blog post, Figma isn’t running a simple blog. It’s a compute-intensive, real-time collaborative platform serving 13 million monthly active users and 450,000 paying customers. It renders complex designs with sub-100ms latency. This isn’t just about spinning up a few virtual machines; it’s about providing a seamless, high-performance experience on a global scale.

The Numbers Game: What the Armchair Experts Missed

The initial panic conveniently ignored a few crucial realities, according to Quinn:

  • Ramping Spend: Most large AWS contracts increase year-over-year. A $109 million annual average over five years likely starts lower (e.g., $80 million) and gradually increases to a higher figure (e.g., $150 million in year five) as the company expands.
  • Post-Discount Figures: These spend targets are post-discount. At Figma’s scale, they’re likely getting a significant discount (think 30% effective discount) on their cloud spend. So, their “retail” spend would be closer to $785 million over five years, not $545 million.

When you factor these in, Figma’s 12% of revenue on cloud infrastructure for a company of its type falls squarely within industry benchmarks:

  • Compute-lite SaaS: Around 5% of revenue.
  • Compute-heavy platforms (like Figma): 10-15% of revenue.
  • AI/ML-intensive companies: Often exceeding 15%.

Furthermore, the increasing adoption of AI and Machine Learning in application development is introducing a new dimension to cloud costs. AI workloads, particularly for training and continuous inference, are incredibly resource-intensive, pushing the boundaries of compute, storage, and specialized hardware (like GPUs), which naturally translates to higher cloud bills. This makes effective FinOps and cost optimization strategies even more crucial for companies that leverage AI at scale.

So, while the internet was busy getting its math wrong and forecasting doom, Figma was operating within a completely reasonable range for its business model and scale.

The “Risky Dependency” Non-Story

Another popular narrative was the “risky dependency” on AWS. Figma’s S-1 filing includes standard boilerplate language about vendor dependencies, a common feature found in virtually every cloud-dependent company’s SEC filings. It’s the legal equivalent of saying, “If the sky falls, our business might be affected.”

Breaking news: a SaaS company that uses a cloud provider might be affected by outages. In related news, restaurants depend on food suppliers. This isn’t groundbreaking insight; it’s just common business risk disclosure. Figma’s “deep entanglement” with AWS, as described by Hacker News commenter nevon, underscores the complexity of modern cloud architectures, where every aspect, from permissions to disaster recovery, is seamlessly integrated. This makes a quick migration akin to performing open-heart surgery without anesthetic – highly complex and not something you do on a whim.

Cloud Repatriation: A Valid Strategy, But Not a Universal Panacea

The discussion around Figma’s costs also brought up the topic of cloud repatriation, with examples like 37signals, whose CTO, David Heinemeier Hansson, has been a vocal advocate for exiting the cloud to save millions. While repatriating certain workloads can indeed lead to significant savings for some companies, it’s not a one-size-fits-all solution.

Every company’s needs are different. For a company like Scrimba, which runs on dedicated servers and spends less than 1% of its revenue on infrastructure, this might be a perfect fit. For Figma, with its real-time collaborative demands and massive user base, the agility, scalability, and managed services offered by a hyperscale cloud provider like AWS are critical to their business model and growth.

This brings us to a broader conversation, especially relevant in the European context: digital sovereignty. As I’ve discussed in my blog post, “Digital Destiny: Navigating Europe’s Sovereignty Challenge,” the deep integration with a single hyperscaler, such as AWS, isn’t just about cost or technical complexity; it also affects the control and autonomy an organization retains over its data and operations. While the convenience of cloud services is undeniable, the potential for vendor lock-in can have strategic implications, particularly concerning data governance, regulatory compliance, and the ability to dictate terms. The ongoing debate around data residency and the extraterritorial reach of foreign laws further amplifies these concerns, pushing some organizations to consider multi-cloud strategies or even hybrid models to mitigate risks and assert greater control over their digital destiny.

My Cloud Anecdote: Costs vs. Value

This whole debate reminds me of a scenario I encountered back in 2017. I was working on a proof of concept for a customer, building a future-proof knowledge base using Cosmos DB, the Graph Model, and Search. The operating cost, primarily driven by Cosmos DB, was approximately 1,000 eurosper month. Some developers immediately flagged it as “too expensive,” as I can recall, or even thought I was selling Cosmos DB. The reception, however, wasn’t universally positive. In fact, one attendee later wrote in their blog:

The most uninteresting talk of the day came from Steef-Jan Wiggers , who, in my opinion, delivered an hour-long marketing pitch for CosmosDB. I think it’s expensive for what it currently offers, and many developers could architect something with just as much performance without needing CosmosDB.

However, the proposed solution was for a knowledge base that customers could leverage via a subscription model. The crucial point was that the costs were negligible compared to the potential revenue the subscription model would net for the customer. It was an investment in a revenue-generating asset, not just a pure expense.

The Bottom Line: Innovation vs. Optimization

Thanks to Quinn, I understand that Figma is actively optimizing its infrastructure, transitioning from Ruby to C++ pipelines, migrating workloads, and implementing dynamic cluster scaling. He concluded:

They’re doing the work. More importantly, they’re growing at 46% year-over-year with a 91% gross margin. If you’re losing sleep over their AWS bill while they’re printing money like this, you might need to reconsider your priorities.

The “innovation <-> optimization continuum” is always at play. Companies often prioritize rapid innovation and speed to market, leveraging the cloud for its agility and flexibility. As they scale, they can then focus on optimizing those costs.

This increasing complexity underscores the growing importance of FinOps (Cloud Financial Operations), a cultural practice that brings financial accountability to the variable spend model of cloud, empowering teams to make data-driven decisions on cloud usage and optimize costs without sacrificing innovation.

Figma’s transparency in disclosing its cloud costs is actually a good thing. It forces a much-needed conversation about the true cost of running enterprise-scale infrastructure in 2025. The hyperbolic reactions, however, expose a fundamental misunderstanding of these realities. Which I also encountered with my Cosmos DB project in 2017.

So, the next time someone tells you that a company spending 12% of its revenue on infrastructure that literally runs its entire business is “doomed,” perhaps ask them how much they think it should cost to serve real-time collaborative experiences to 13 million users across the globe. The answer, if based on reality, might surprise them.

Lastly, as the cloud landscape continues to evolve, with new services, AI integration, and shifting geopolitical considerations, the core lesson remains: smart cloud investment isn’t about avoiding the bill, but understanding its true value in driving business outcomes and strategic advantage. The dialogue about cloud costs is far from over, but it’s time we grounded it in reality.

Digital Destiny: Navigating Europe’s Sovereignty Challenge

During my extensive career in IT, I’ve often seen how technology can both empower and entangle us. Today, Europe and the Netherlands find themselves at a crucial junction, navigating the complex landscape of digital sovereignty. Recent geopolitical shifts and the looming possibility of a “Trump II” presidency have only amplified our collective awareness: we cannot afford to be dependent on foreign legislation when it comes to our critical infrastructure.

In this post, I will delve into the threats and strategic risks that underpin this challenge. We’ll explore the initiatives being undertaken at both the European and Dutch levels, and crucially, what the major U.S. Hyperscalers are now bringing to the table in response.

The Digital Predicament: Threats to Our Autonomy

The digital revolution has certainly brought unprecedented benefits, not least through innovative Cloud Services that are transforming our economy and society. However, this advancement has also positioned Europe in a state of significant dependency. Approximately 80% of our digital infrastructure relies on foreign companies, primarily American cloud providers, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud. This reliance isn’t just a matter of convenience; it’s a strategic vulnerability.

The Legal Undercurrent: U.S. Legislation

One of the most persistent threats to European digital sovereignty stems from American legislation. The CLOUD Act (2018), an addition to the Freedom Act (2015) that replaced the Patriot Act (2001), grants American law enforcement and security services the power to request data from American cloud service providers, even if that data is stored abroad.

Think about it: if U.S. intelligence agencies can request data from powerhouses like AWS, Microsoft, or Google without your knowledge, what does this mean for European organizations that have placed their crown jewels there? This directly clashes with Europe’s stringent privacy regulations, the General Data Protection Regulation (GDPR), which sets strict requirements for the protection of personal data of individuals in the EU.

While the Dutch National Cyber Security Centre (NCSC) has stated that, in practice, the chance of the U.S. government requesting European data via the CLOUD Act has historically been minimal, they also acknowledge that this could change with recent geopolitical developments. The risk is present, even though it has rarely materialized thus far.

Geopolitics: The Digital Chessboard

Beyond legal frameworks, geopolitical developments pose a very real threat to our digital autonomy. Foreign governments may impose trade barriers and sanctions on Cloud Services. Imagine scenarios where tensions between major powers lead to access restrictions for essential Cloud Services. The European Union or even my country cannot afford to be a digital pawn in such a high-stakes game.

We’ve already seen these dynamics play out. In negotiations for a minerals deal with Ukraine, the White House reportedly made a phone call to stop the delivery of satellite images from Maxar Technologies, an American space company. These images were crucial for monitoring Russian troop movements and documenting war crimes.

Another stark example is the Microsoft-ICC incident, where Microsoft blocked access to email and Office 365 services for the chief prosecutor of the International Criminal Court in The Hague due to American sanctions. These incidents serve as powerful reminders of how critical external political pressures can be in impacting digital services.

Europe’s Response: A Collaborative Push for Sovereignty

Recognizing these challenges, both Europe and the Netherlands are actively pursuing initiatives to bolster digital autonomy. It’s also worth noting how major cloud providers are responding to these evolving demands.

European Ambitions:

The European Union has been a driving force behind initiatives to reinforce its digital independence:

  • Gaia-X: This ambitious European project aims to create a trustworthy and secure data infrastructure, fostering a federated system that connects existing European cloud providers and ensures compliance with European regulations, such as the General Data Protection Regulation (GDPR). It’s about creating a transparent and controlled framework.
  • Digital Markets Act (DMA) & Digital Services Act (DSA): These legislative acts aim to regulate the digital economy, fostering fairer competition and greater accountability from large online platforms.
  • Cloud and AI Development Act (proposed): This upcoming legislation seeks to ensure that strategic EU use cases can rely on sovereign cloud solutions, with the public sector acting as a crucial “anchor client.”
  • EuroStack: This broader initiative envisions Europe as a leader in digital sovereignty, building a comprehensive digital ecosystem from semiconductors to AI systems.

Crucially, we’re seeing tangible progress here. Virt8ra, a significant European initiative positioning itself as a major alternative to US-based cloud vendors, recently announced a substantial expansion of its federated infrastructure. The platform, which initially included Arsys, BIT, Gdańsk University of Technology, Infobip, IONOS, Kontron, MONDRAGON Corporation, and Oktawave, all coordinated by OpenNebula Systems, has now been joined by six new cloud service providers: ADI Data Center Euskadi, Clever Cloud, CloudFerro, OVHcloud, Scaleway, and Stackscale. This expansion is a clear indicator that the vision for a robust, distributed European cloud ecosystem is gaining significant traction.

Dutch Determination:

The Netherlands is equally committed to this journey:

  • Strategic Digital Autonomy and Government-Wide Cloud Policy: A coalition of Dutch organizations has developed a roadmap, proposing a three-layer model for government cloud policy that advocates for local storage of state secret data and autonomy requirements for sensitive government data.
  • Cloud Kootwijk: This initiative brings together local providers to develop viable alternatives to hyperscaler clouds, fostering homegrown digital infrastructure.
  • “Reprogram the Government” Initiative: This initiative advocates for a more robust and self-reliant digital government, pushing for IT procurement reforms and in-house expertise.
  • GPT-NL: A project to develop a Dutch language model, strengthening national strategic autonomy in AI and ensuring alignment with Dutch values.

Hyperscalers and the Sovereignty Landscape:

The growing demand for digital sovereignty has prompted significant responses from major cloud providers, demonstrating a recognition of European concerns:

  • AWS European Sovereign Cloud: AWS has announced key components of its independent European governance for the AWS European Sovereign Cloud.
  • Microsoft’s Five Digital Commitments: Microsoft recently outlined five significant digital commitments to deepen its investment and support for Europe’s technological landscape.

These efforts from hyperscalers highlight a critical balance. As industry analyst David Linthicum noted, while Europe’s drive for homegrown solutions is vital for data control, it also prompts questions about access to cutting-edge innovations. He stresses the importance of “striking the right balance” to ensure sovereignty efforts don’t inadvertently limit access to crucial capabilities that drive innovation.

However, despite these significant investments, skepticism persists. There is an ongoing debate within Europe regarding digital sovereignty and reliance on technology providers headquartered outside the European Union. Some in the community express doubts about how such companies can truly operate independently and prioritize European interests, with comments like, “Microsoft is going to do exactly what the US government tells them to do. Their proclamations are meaningless.” Others echo the sentiment that “European money should not flow to American pockets in such a way. Europe needs to become independent from American tech giants as a way forward.” This collective feedback highlights Europe’s ongoing effort to develop its own technological capabilities and reduce its reliance on non-European entities for critical digital infrastructure.

My perspective on this situation is that achieving true digital sovereignty for Europe is a complex and multifaceted endeavor, marked by both opportunities and challenges. While the commitments from global hyperscalers are significant and demonstrate a clear response to European demands, the underlying desire for independent, European-led solutions remains strong. It’s not about outright rejection of external providers, but about strategic autonomy – ensuring that we, as Europeans, maintain ultimate control over our digital destiny and critical data, irrespective of where the technology originates.

Azure Cosmos DB’s Latest Performance Features

As an earlier adopter of Azure Cosmos DB, I have always been following the developments of this service and have built up my experience myself with leveraging it for monitoring purposes (a recent one is presented at Azure Cosmos DB Conf 2023 – Leveraging Azure Cosmos DB for End-to-End Monitoring of Retail Processes).

Azure Cosmos DB

For those unfamiliar with Azure Cosmos DB, Microsoft’s globally distributed, multi-model database service offers low-latency, scalable storage and querying of diverse data types. It allows developers to build applications with data access and high availability across regions. Its well-known counterpart is Amazon DynamoDB.

In this blog post, I like to point out some recent optimizations of the service around performance. Moreover, I have written an InfoQ news item recently on this as well.

Priority-based execution

One of the more recent features introduced in the service is priority-based execution, which is currently in public preview.  It allows users to define the priority of requests sent to Azure Cosmos DB. When the number of requests surpasses the configured Request Units per second (RU/s) limit, lower-priority requests are slowed down to prioritize the processing of high-priority requests, as specified by the user’s defined priority.

As mentioned in a blog post by Microsoft, this feature empowers users to prioritize critical tasks over less crucial ones in situations where a container surpasses its configured request units per second (RU/s) capacity. Less important tasks are automatically retried by clients using an SDK with the specified retry policy until they can be successfully processed.

With priority-based execution, you have the flexibility to allocate varying priorities to workloads operating within the same container in your application. This proves beneficial in numerous scenarios, including prioritizing read, write, or query operations, as well as giving precedence to user actions over background tasks like bulk execution, stored procedures, and data ingestion/migration.

Once accepted, a nomination form is available to access the feature and .NET SDK.

Hierarchical Partition Keys

In addition to Priority-based execution, the product group for Cosmos DB also introduced Hierarchical Partition Keys to optimize performance.

Hierarchical partition keys enhance Cosmos DB’s elasticity, particularly in scenarios where users utilize synthetic- or logical partition keys surpassing 20 GB of data. By employing up to three keys with hierarchical partitioning, users can effectively sub-partition their data, achieving superior data distribution and enabling greater scalability. Azure Cosmos DB automatically distributes the data among physical partitions, allowing logical partition prefixes to exceed the 20GB storage limit.

According to the documentation, the simplest way to create a container and specify hierarchical partition keys is using the Azure portal. 

For example, you can use hierarchical partition keys to partition data by tenant ID and then by item ID. This way, all items for a given tenant are stored together in the same physical partition. This can improve query performance by reducing the number of physical partitions that need to be queried. 

A more detailed explanation and use case for hierarchical keys in Azure Cosmos DB can be found in the blog post by Leonard Lobel. 

Burst Capacity Feature

Lastly, the team also made the burst capacity feature for Azure Cosmos DB generally available (GA) to allow you to take advantage of your database or container’s idle throughput capacity to handle traffic spikes.   

Burst capacity allows each physical partition to accumulate up to 5 minutes of idle capacity, which can be utilized at a rate of up to 3000 RU/s. This feature is applicable to databases and containers utilizing manual or autoscale throughput, provided they have less than 3000 RU/s provisioned per physical partition.

To begin utilizing burst capacity, access the Features page within your Azure Cosmos DB account and enable the Burst Capacity feature. Please note that the feature may take approximately 15-20 minutes to become active once enabled.  

Enabling the burst capacity feature (Source: Microsoft Learn Bust Capacity)

According to the documentation, to use the feature, you need to consider the following: 

  • If your Azure Cosmos DB account is configured with provisioned throughput (manual or autoscale), burst capacity is not applicable. Burst capacity is specifically for serverless accounts.  
  • Additionally, burst capacity is compatible with Azure Cosmos DB accounts utilizing the API for NoSQL, Cassandra, Gremlin, MongoDB, or Table. 

Lastly, in case you are wondering what the difference between burst capacity and priority-based execution is, Jay Gordon, a Senior Cosmos DB program manager, explained that in the discussion of the blog post around these performance features:

The difference between burst capacity and execution based on priority lies in their impact on performance and resource allocation:

Burst capacity affects the overall throughput capacity of your Azure Cosmos DB container or database. It allows you to temporarily exceed the provisioned throughput to handle sudden spikes in workload. Burst capacity helps maintain low latency and prevent throttling during peak usage periods.

Execution based on priority determines the order in which requests are processed when multiple concurrent requests exist. Higher priority requests are prioritized and typically get faster access to resources for execution. This ensures that essential or time-sensitive operations are processed promptly, while lower-priority requests may experience slight delays.

“In terms of results, burst capacity and execution based on priority are independent. Utilizing burst capacity allows you to handle temporary workload spikes, whereas execution based on importance ensures that higher-priority requests are processed more promptly. These mechanisms work together to optimize performance and resource allocation in Azure Cosmos DB, but they serve different purposes“.

Conclusion

In conclusion, Azure Cosmos DB continues to evolve with new features designed to enhance performance and scalability. The priority-based execution, currently in public preview, enables users to prioritize critical tasks over less important ones when the request unit capacity is exceeded. This flexibility is further enhanced by introducing hierarchical partition keys, allowing optimal data distribution and larger scales in scenarios with substantial data. Additionally, the burst capacity feature, now generally available, provides an efficient way to handle traffic spikes by utilizing idle throughput capacity. Users can easily enable burst capacity through the Azure Cosmos DB account’s Features page, making it a valuable tool for serverless accounts.

Returning to Amazon, DynamoDB, the Cosmos DB counterpart on AWS, offers performance-optimizing capabilities. Concepts are similar.

Event-Driven Services in the Cloud: Azure Event Grid, AWS Event Bridge, and Google EventArc

I mentioned Azure Event Grid in a scenario with D365FO Business Events in a previous blog post. It is a Platform as a Service (PaaS) capability in Azure or eventing platform or event bus (I see various terms describing the service) allowing you to centrally manage events. In addition, it supports direct event filtering based on event type, prefix, or suffix, so your application will only receive events that are relevant to it.

Whether you want to handle built-in Azure events, such as a file being added to storage, or create your own custom events and event handlers, Event Grid supports both options via the same underlying model. Thus, regardless of the service or use case, intelligent routing and filtering capabilities apply to every event scenario and ensure that your apps focus on core business logic rather than worrying about event routing.

In this blog post, I like to dive into Azure Event Grid and competitive offering on the two other big cloud providers, AWS and Google.

Azure Event Grid

In 2017 Microsoft introduced Azure Event Grid as a fully-managed event routing service and the first of its kind (meaning the public cloud claimed it was the first offering the service). Dan Rosanova, previously Principal Program Manager Lead at Microsoft, now Director Program Management at Confluent, said in an InfoQ news item on the introduction:

Azure Event Grid fills a gap in the current cloud messaging space, not just in Azure but also across all cloud providers. We have services for messaging, queuing, and telemetry, but nothing for comprehensive eventing, particularly for cross-service or cross-cloud scenarios.

Within Azure service supporting Event Grid generates events routed to several event handlers. These handlers support event filtering and reliable delivery, ranging from Azure Functions to webhooks. Furthermore, underhood, the service relies on Service Fabric and thus can scale automatically to handle millions of events per second.

Event Grid model of sources and handlers

Source: https://docs.microsoft.com/en-us/azure/event-grid/overview

The Event Grid concept revolves around events emitted from a source (publisher), an Azure service, or a third-party source that adheres to the event schema (proprietary schema or the CNCF Cloud Events schema). For example, IoT Hub, Storage, and others are all event publishers in Azure. Following that, the events are sent to a topic in Event Grid, and each topic can have one or more subscribers (event handlers). A topic can be set up with the event publisher, or it can be a custom topic for custom events. Finally, event handlers respond to and process the events. Functions, WebHooks, and Event Hubs are examples of event handlers in Azure.

Azure Event Grid generally became available (GA) in February 2018 and Clemens Vasters, Principal Architect Messaging Services at Microsoft, said:

Event Grid is catching everyone’s attention because it unlocks new architectural possibilities for cloud platforms and applications: it’s the glue that enables information flow between services, and Event Grid allows expanding the capabilities of existing services by extension.

And that’s what also triggered or got the attention of AWS as they released EventBridge in July 2019, labeled as a serverless event bus that allows AWS services, Software-as-a-Service (SaaS), and custom applications to communicate with each other using events.

Since the GA, Azure EventGrid received several updates, including advanced filtering, retry policies, and support for CloudEvents. More details and samples are available on the Microsoft documentation and GitHub. Note that there is also an introductory paper available on Azure Event Grid and GitHub from Clemens.

AWS Eventbridge

You can use EventBridge to build and manage event-driven solutions by centrally controlling event ingestion, delivery, security, authorization, and error handling. Furthermore, you do not have to manage any infrastructure or scaling and only pay for the events that their applications consume, similar to Azure Event Grid. Moreover, the concepts are the same too.

How Amazon EventBridge connects applications using events

Source: https://aws.amazon.com/eventbridge/

However, Amazon Eventbridge surpasses Azure Event Grid with features (as you can see from the diagram above). It has a schema registry allowing you to discover, create, and manage OpenAPI schemas for events on EventBridge. According to the documentation, you can find schemas for existing AWS services, create and upload custom schemas, or generate a schema based on events located on an event bus. Furthermore, EventBridge enables you to generate and download code bindings for all event schemas to help quickly build applications that use those events.

Next to the schema registry, the service integrates easily with third-party services like Zendesk, Pagerduty, and SignalFx. Amazon has set up an extensive partner program for these integrations. Event Grid supports partner events (still preview) yet only has one with Auth0.  

Another capability Amazon EventBridge offers is event replay and archive –  allowing you to archive events so that you can easily replay them later by starting an event replay. Again, a capability that is not available in Azure Event Grid. Although it is something, you can find in Azure Event Hubs. You can configure the archive capability with the actions menu on the EventBridge Console and set the events’ retention period (ranging from zero days to infinite). Subsequently, you can optionally set a pattern matching filter for which events to archive. Later, when events run through the event bus, you can replay the events by selecting the appropriate archive.

Sample Implementation AWS EventBridge

Since the inception of Event Grid, I followed its evolution and wrote and presented on it. Moreover, I followed its competitive solution on AWS and, next to writing about it on InfoQ, built a simple demo around it using .NET in combination with AWS EventBridge. Below you will find a diagram of the demo I created.

Amazon EventBridge Demo

From .NET code, I send an event to a custom event bus containing a rule to send the event to a destination, an Amazon Simple Queue. Subsequently, an AWS Lambda function can poll the queue and receive the message – below shows the steps until the SQS queue.

EventBridge Demo Steps

You can find a live demo on YouTube with demoing the above (minute 19). Furthermore, you can look at other samples like in the AWS documentation or on GitHub.

Google Eventarc

With Azure and AWS offering a service to centrally manage events, Google followed in October 2020 with Eventarc to provide customers with a service to connect Cloud Run services with events from various sources, adhering to the CloudEvents standard. It became generally available in January 2021.

Eventarc’s underlying delivery mechanism is Pub/Sub, which includes topics and subscriptions similar to previously discussed Event Grid and EventBridge. Event sources create events and publish them in any format on the Pub/Sub topic. The events are then delivered to the Google Run sinks. For applications running on Cloud Run, you can use Eventarc to use a Cloud Storage event (via Cloud Audit Logs) to trigger a data processing pipeline or an event from custom sources (publishing to Cloud Pub/Sub) to signal between microservices.

Eventarc Overview

Source: https://codelabs.developers.google.com/codelabs/cloud-run-events#1

The diagram above shows what Google hopes to achieve with Eventarc. Currently, you can Cloud Run Service as a destination, and recently Cloud Run for Anthos has been added. Additionally, you can leverage a UI through the Google Cloud console allowing you to view, edit, and delete EventArc triggers. Lastly, you can find more details and samples on GitHub.

CloudEvents Schema

Before I end the blog post with some conclusions, I like to discuss the CloudEvent schema. CloudEvents is an open-source specification for consistently describing event data to make event declaration and delivery easier across services, platforms, and beyond. The Cloud Native Computing Foundation (CNCF) is the driving force behind the specification, which reached the version 1.0 milestone in October 2019.

Clemens Vasters, Principal Architect Messaging Services at Microsoft, stated in an InfoQ news item on CloudEvents:

The goal was to provide an industry definition and open framework for what an “event” is, what its minimal semantic elements are, and how events are encoded for transfer and how they are transferred and do so using the major encodings and application protocols in use today rather than inventing new ones.

Earlier I mentioned that Azure Event Grid has its own proprietary schema and supports CloudEvent schema. The differences are shown below:

CloudEvent vs Event Grid Schema

Note that Azure Event Grid and Google Eventarc support the CloudEvent schema; however, AWS EventBridge does not, leading to customization.

Conclusion

From this blog post, you can probably conclude that AWS with Eventbridge delivers the most complete event bus or eventing platform in the cloud than Event Grid and Eventarc. If I rank each, AWS comes first, Azure second, and Eventarc third based on features and maturity. The service overlap in concepts, yet implementation, support, and features differ dramatically. Interestingly, they all support changes in their respective storage service. Azure Event Grid brings support for events like when blobs are created, and EventBridge supports S3 notifications and Eventarc triggers for Cloud storage. You can think of various scenarios regarding storage and events, for instance, the pipe and filters pattern implementation discussed in my first blog post.

A High-Level View of Cloud Governance

Something that intrigues me in the cloud is governance. As a technical integration architect, that’s the role/function I have in my current day-to-day job. Yet, during designing solutions, I usually do not think about it or talk to a customer set on moving to the cloud – that’s a cloud migration process, which I am generally not involved with. Still, it should have my attention, and it has now.

You might ask if it sounds unfamiliar to you, what is governance? First, you could look up the term in Wikipedia. And you’ll find the explanation or definition in the first lines mentioning a process of interactions through laws, norms, power, or language of an organized society over a social system such as tribe, family, formal or informal organization. Yet how does this relate to the cloud? Well, very simple, it is still a process of interactions, however, defined by what a cloud provider deems necessary to keep costs, access to data, consistency, and deployments under control.

A Cloud provider like Microsoft, AWS, and Google can provide you with guidance regarding governance to manage costs, secure resources and access to data, and consistency in the deployment of resources – each provides frameworks for that:

The Google Adoption Framework whitepaper will mention governance regarding data, cost control, security, and cloud resources management. While AWS CAF has governance as one of its six perspectives. And Microsoft has a section of Govern in their Framework and a landing page.

Microsoft Cloud Adoption Framework

Source: https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/overview

I now like to zoom further into governance on Microsoft Azure since I predominantly work as a (solution) architect (integration) on that Cloud platform. Furthermore, I will not look at the process extensively described in the CAF, yet more on some of the services and capabilities available in Azure and add some of my views and relevant resources I found.

Azure Resources

Microsoft provides policies on Azure to allow you to keep resources compliant. When a policy is assigned, it will, when it is triggered, evaluate if it adheres to a definition. You can use these policies to implement governance for resource consistency, regulatory compliance, security, cost, and management. For more details on Azure Policies, see Azure Policy on GitHub.

Next to policies tagging is another aspect of governance in Azure or any cloud platform. With tags, you can assign helpful information to any resource within your cloud infrastructure – usually information not included in the name of available in the overview of the resource. Tagging is critical for cost management, operations, and management of resources. More details on how to apply them are available in the decision guide.

If you work at a company with many subscriptions, or the customer you work for does, you can leverage management groups –a level of scope above subscriptions. It provides a way to organize subscriptions into containers and thus provide a logical structure. Moreover, you can apply specific governance conditions with management groups as each subscription in a group inherits them.

Diagram of a sample management group hierarchy.

Source: https://docs.microsoft.com/en-us/azure/governance/management-groups/overview

More details on management groups are available on the GitHub page.

Another intriguing service is the Azure Resource Graph, a capability in Azure to query, explore, and analyze your cloud resources. It includes an Explorer you can use in the Azure portal and can also be used programmatically through the Azure CLI, Azure PowerShell and Azure SDK for .NET.

You can use Graph Explorer to explore resources based on your governance requirements and assess the impact of applying policies in your environments. The query language is based on the Kusto query language used by Azure Data Explorer. More details are available on the GitHub page.

And lastly, Azure Blueprints can enable you to define a repeatable set of Azure resources that implements and adheres to an organization’s standards, patterns, and requirements. As a result, you can orchestrate the deployment of various resource templates and other artifacts such as the earlier mentioned policies, role assignments, ARM templates, and resource groups in a declarative way. With blueprints, you can consistently deploy predefined environments. Other public cloud providers offer blueprints as well: AWS Blueprints and GCP Blueprints. You can find more details on Azure blueprints on GitHub.

Cost Management

The cost management + billing service and features are available in any subscription in the Azure portal. It will allow you to do administrative tasks around billing, set spending thresholds, and proactively analyze azure cost generation. A key aspect is regarding cost control is to set up budgets at the beginning once a subscription before workloads land or resources are provisioned for the development of cloud solutions. Furthermore, once consumption of Azure resources starts, you can look at recommendations for cost optimizations. Moreover, Azure Advisor can help identify underutilized or unused resources to be optimized or shut down.

Example of the Subscription Overview tab showing Offer and Offer ID

Source: https://docs.microsoft.com/en-us/azure/cost-management-billing/costs/understand-cost-mgt-data

Security

An essential aspect of governance is security, for example, who gets access to what resource in Azure. A consistent way to set that up is by applying the earlier mentioned blueprint. Azure AD plays a role as well when you add accounts, service principles (an identity created for use with applications, hosted services, and automated tools to access Azure resources – similar to a service account on Windows), and app registrations (Application Object).

Azure AD is an Identity and Access solution with several features, such as conditional access, Multi-Factor Authentication (MFA), and Singel-SignOn (SSO) support. In addition, it is an essential service with regards to governance to provide access to the application (services) and people to Azure resources – and you want that consistent and accurate when it comes to who is responsible for what. And lastly, Microsoft provides best practices and guidance on this service you can look into.

Data Governance

Microsoft launched Purview into a public preview for data governance in December 2020, and it became generally available later in October 2021. With Azure Purview, the company delivers an Azure service that can help you understand what data your company has and provide means to manage the data’s compliance with privacy regulations and derive valuable insights.