The Walking Skeleton and Pipes & Filters: Building Resilient Integration Architectures

I’ve spent quite some time in IT doing enterprise integration, and if there’s one truth that consistently holds up, it’s that a solid foundation prevents future disappointment or failure. We’ve all been there: a rush to deliver features on a shaky, unvalidated architecture, leading to months of painful, expensive refactoring down the line.

My experience in retail showed me that, and I was involved in rebuilding an integration platform. In the world of integration, where you’re constantly juggling disparate systems, multiple data formats, and unpredictable volumes, a solid architecture is paramount. Thus, I always try to build the best solution based on experience rather than on what’s written in the literature.

What is funny to me is that when I built the integration platform, I realized I was applying patterns like the Walking Skeleton for architectural validation and the Pipes and Filters pattern for resilient, flexible integration flows.

The Walking Skeleton caught my attention when a fellow architect at my current workplace brought it to my attention. And I realized that this is what I actually did with my team at the retailer. Hence, I should read some literature from time to time!

The Walking Skeleton: Your Architectural First Step

Before you write a line of business logic, you need to prove your stack works from end to end. The Walking Skeleton is precisely that: a minimal, fully functional implementation of your system’s architecture.

It’s not an MVP (Minimum Viable Product), which is a business concept focused on features; the Skeleton is a technical proof-of-concept focused on connectivity.

Why Build the Skeleton First?

  • Risk Mitigation: You validate your major components—UI, API Gateway, Backend Services, Database, Message Broker—can communicate and operate correctly before you invest heavily in complex features.
  • CI/CD Foundation: By its nature, the Skeleton must run end-to-end. This forces you to set up your CI/CD pipelines early, giving you a working deployment mechanism from day one.
  • Team Alignment: A running system is the best documentation. Everyone on the team gets a shared, tangible understanding of how data flows through the architecture.

Suppose you’re building an integration platform in the cloud (like with Azure). In that case, the Walking Skeleton confirms your service choices, such as Azure Functions and Logic Apps, which integrate with your storage, networking, and security layers. Guess what I am going to do again in the near future, I hope.

Leveraging Pipes and Filters Within the Skeleton

Now, let’s look at what that “minimal, end-to-end functionality” should look like, especially for data and process flow. The Pipes and Filters pattern is ideally suited for building the first functional slice of your integration Skeleton.

The pattern works by breaking down a complex process into a sequence of independent, reusable processing units (Filters) connected by communication channels (Pipes).

How They Map to Integration:

  1. Filters = Single Responsibility: Each Filter performs one specific, discrete action on the data stream, such as:
    • Schema Validation
    • Data Mapping (XML to JSON)
    • Business Rule Enrichment
    • Auditing/Logging
  2. Pipes = Decoupled Flow: The Pipes ensure data flows reliably between Filters, typically via a message broker or an orchestration layer.

In a serverless environment (e.g., using Azure Functions for the Filters and Azure Service Bus/Event Grid for the Pipes), this pattern delivers immense value:

  • Composability: Need to change a validation rule? You only update one small, isolated Filter. Need a new output format? You add a new mapping Filter at the end of the pipe.
  • Resilience: If one Filter fails, the data is typically held in the Pipe (queue/topic), preventing the loss of the entire transaction and allowing for easy retries.
  • Observability: Each Filter is a dedicated unit of execution. This makes monitoring, logging, and troubleshooting exact no more “black box” failures.

The Synergy: Building and Expanding

The real power comes from using the pattern within the process of building and expanding your Walking Skeleton:

  1. Initial Validation (The Skeleton): Select the absolute simplest, non-critical domain (e.g., an Article Data Distribution pipeline, as I have done with my team for retailers). Implement this single, end-to-end flow using the Pipes and Filters pattern. This proves that your architectural blueprint and your chosen integration pattern work together.
  2. Iterative Expansion: Once the Article Pipe is proven, validating the architectural choice, deployment, monitoring, and scaling, you have a template.
    • At the retailer, we subsequently built the integration for the Pricing domain, and by creating a new Pipe that reuses common Filters (e.g., the logging or basic validation Filters).
    • Next, we picked another domain by cloning the proven pipeline architecture and swapping in the domain-specific Filters.

You don’t start from scratch; you reapply a proven, validated template across domains. This approach dramatically reduces time-to-market and ensures that every new domain is built on a resilient, transparent, and scalable foundation.

My advice, based on what I know now and my experience, is not to skip the Skeleton. And don’t build a monolith inside it. Start with Pipes and Filters and Skeleton for a future-proof, durable architecture for enterprise integration when rebuilding an integration platform in Azure.

What architectural pattern do you find most useful when kicking off a new integration project? Drop a comment!

AWS Shifts to a Credit-Based Free Plan, Aligning with Azure and GCP

AWS is officially moving away from its long-standing 12-month free tier for new accounts. The new standard, called the Free Account Plan, is a credit-based model designed to eliminate the risk of unexpected bills for new users.

With this new plan, you get:

  • A risk-free environment for experimenting and building proofs of concept for up to six months.
  • A starting credit of $100, with the potential to earn another $100 by completing specific exploration activities, such as launching an EC2 instance. This means you can get up to $200 in credits to use across eligible services.
  • The plan ends after six months or once your credits are entirely spent, whichever comes first. After that, you have a 90-day window to upgrade to a paid plan and restore access to your account and data.

This shift, as Principal Developer Advocate Channy Yun explains, allows new users to get hands-on experience without cost commitments. However, it’s worth noting that some services typically used by large enterprises won’t be available on this free plan.

While some may see this as a step back, I tend to agree with Corey Quinn’s perspective. He writes that this is “a return to product-led growth rather than focusing on enterprise revenue to the exclusion of all else.” Let’s face it: big companies aren’t concerned with the free tier. But for students and hobbyists, who can be seen as the next generation of cloud builders, a credit-based, risk-free sandbox is a much more attractive proposition. The new notifications for credit usage and expiration dates are a smart addition that provides peace of mind.

How the New Plan Compares to Other Hyperscalers

A helpful plan for those who like to experiment on AWS, I think. Yet, other hyperscalers like Azure and GCP offer similar plans too. Microsoft Azure and Google Cloud Platform (GCP) have long operated on credit-based models.

  • Azure offers a different model: $200 in credits for the first 30 days, supplemented by over 25 “always free” services and a selection of services available for free for 12 months.
  • GCP provides a 90-day, $300 Free Trial for new customers, which can be applied to most products, along with an “Always Free” tier that gives ongoing access to core services like Compute Engine and Cloud Storage up to specific monthly limits.

This alignment among the major cloud providers highlights a consensus on the best way to attract and onboard new developers.

Microsoft also offers $100 in Azure credits through Azure for students. Note that the MSDN credits are typically a monthly allowance tied to a specific Visual Studio subscription, and the student credits are a lump sum for a particular period (e.g., 12 months), as I believe these different models can be confusing.

Speaking of other cloud providers, my own experience with Azure is an excellent example of how these credit models can be beneficial. I enjoy credits for Azure because of my MVP benefits, and through MSDN subscriptions, one has a monthly $150 in credits. These are different options from the general one I mentioned earlier. Anyway, there are ways to access services provided by the three big hyperscalers that allow you to get hands-on experience in combination with their documentation and what you can find in public repos.

In general, when you like to learn more about Azure, AWS, or GCP, the following table shows the most straightforward options:

Cloud HyperscalerFree CreditsDocumentationRepo (samples)
AzureAzure Free AccountMicrosoft LearnAzure Samples · GitHub  
AWSAWS Free TierAWS DocumentationAWS Samples · GitHub
GCPGCP Free TrialGoogle Cloud DocumentationGoogle Cloud Platform · GitHub

Decoding Figma’s AWS Spend: Beyond the Hype and Panic

Figma’s recent IPO filing revealed a daily AWS expenditure of roughly $300,000, translating to approximately $109 million annually, or 12% of its reported revenue of $821 million. The company also committed to a minimum spend of $545 million over the next five years with AWS. Cue the online meltdown. “Figma is doomed!” “Fire the CTO!” The internet, in its infinite wisdom, declared. I wrote a news item on it for InfoQ and thought, ‘Let’s put things into perspective and add my own experience.’

(Source: Figma.com)

But let’s inject a dose of reality, shall we? As Corey Quinn from The Duckbill Group, who probably sees more AWS invoices than you’ve seen Marvel movies, rightly points out, this kind of spending for a company like Figma is boringly normal.

As Quinn extensively details in his blog post, Figma isn’t running a simple blog. It’s a compute-intensive, real-time collaborative platform serving 13 million monthly active users and 450,000 paying customers. It renders complex designs with sub-100ms latency. This isn’t just about spinning up a few virtual machines; it’s about providing a seamless, high-performance experience on a global scale.

The Numbers Game: What the Armchair Experts Missed

The initial panic conveniently ignored a few crucial realities, according to Quinn:

  • Ramping Spend: Most large AWS contracts increase year-over-year. A $109 million annual average over five years likely starts lower (e.g., $80 million) and gradually increases to a higher figure (e.g., $150 million in year five) as the company expands.
  • Post-Discount Figures: These spend targets are post-discount. At Figma’s scale, they’re likely getting a significant discount (think 30% effective discount) on their cloud spend. So, their “retail” spend would be closer to $785 million over five years, not $545 million.

When you factor these in, Figma’s 12% of revenue on cloud infrastructure for a company of its type falls squarely within industry benchmarks:

  • Compute-lite SaaS: Around 5% of revenue.
  • Compute-heavy platforms (like Figma): 10-15% of revenue.
  • AI/ML-intensive companies: Often exceeding 15%.

Furthermore, the increasing adoption of AI and Machine Learning in application development is introducing a new dimension to cloud costs. AI workloads, particularly for training and continuous inference, are incredibly resource-intensive, pushing the boundaries of compute, storage, and specialized hardware (like GPUs), which naturally translates to higher cloud bills. This makes effective FinOps and cost optimization strategies even more crucial for companies that leverage AI at scale.

So, while the internet was busy getting its math wrong and forecasting doom, Figma was operating within a completely reasonable range for its business model and scale.

The “Risky Dependency” Non-Story

Another popular narrative was the “risky dependency” on AWS. Figma’s S-1 filing includes standard boilerplate language about vendor dependencies, a common feature found in virtually every cloud-dependent company’s SEC filings. It’s the legal equivalent of saying, “If the sky falls, our business might be affected.”

Breaking news: a SaaS company that uses a cloud provider might be affected by outages. In related news, restaurants depend on food suppliers. This isn’t groundbreaking insight; it’s just common business risk disclosure. Figma’s “deep entanglement” with AWS, as described by Hacker News commenter nevon, underscores the complexity of modern cloud architectures, where every aspect, from permissions to disaster recovery, is seamlessly integrated. This makes a quick migration akin to performing open-heart surgery without anesthetic – highly complex and not something you do on a whim.

Cloud Repatriation: A Valid Strategy, But Not a Universal Panacea

The discussion around Figma’s costs also brought up the topic of cloud repatriation, with examples like 37signals, whose CTO, David Heinemeier Hansson, has been a vocal advocate for exiting the cloud to save millions. While repatriating certain workloads can indeed lead to significant savings for some companies, it’s not a one-size-fits-all solution.

Every company’s needs are different. For a company like Scrimba, which runs on dedicated servers and spends less than 1% of its revenue on infrastructure, this might be a perfect fit. For Figma, with its real-time collaborative demands and massive user base, the agility, scalability, and managed services offered by a hyperscale cloud provider like AWS are critical to their business model and growth.

This brings us to a broader conversation, especially relevant in the European context: digital sovereignty. As I’ve discussed in my blog post, “Digital Destiny: Navigating Europe’s Sovereignty Challenge,” the deep integration with a single hyperscaler, such as AWS, isn’t just about cost or technical complexity; it also affects the control and autonomy an organization retains over its data and operations. While the convenience of cloud services is undeniable, the potential for vendor lock-in can have strategic implications, particularly concerning data governance, regulatory compliance, and the ability to dictate terms. The ongoing debate around data residency and the extraterritorial reach of foreign laws further amplifies these concerns, pushing some organizations to consider multi-cloud strategies or even hybrid models to mitigate risks and assert greater control over their digital destiny.

My Cloud Anecdote: Costs vs. Value

This whole debate reminds me of a scenario I encountered back in 2017. I was working on a proof of concept for a customer, building a future-proof knowledge base using Cosmos DB, the Graph Model, and Search. The operating cost, primarily driven by Cosmos DB, was approximately 1,000 eurosper month. Some developers immediately flagged it as “too expensive,” as I can recall, or even thought I was selling Cosmos DB. The reception, however, wasn’t universally positive. In fact, one attendee later wrote in their blog:

The most uninteresting talk of the day came from Steef-Jan Wiggers , who, in my opinion, delivered an hour-long marketing pitch for CosmosDB. I think it’s expensive for what it currently offers, and many developers could architect something with just as much performance without needing CosmosDB.

However, the proposed solution was for a knowledge base that customers could leverage via a subscription model. The crucial point was that the costs were negligible compared to the potential revenue the subscription model would net for the customer. It was an investment in a revenue-generating asset, not just a pure expense.

The Bottom Line: Innovation vs. Optimization

Thanks to Quinn, I understand that Figma is actively optimizing its infrastructure, transitioning from Ruby to C++ pipelines, migrating workloads, and implementing dynamic cluster scaling. He concluded:

They’re doing the work. More importantly, they’re growing at 46% year-over-year with a 91% gross margin. If you’re losing sleep over their AWS bill while they’re printing money like this, you might need to reconsider your priorities.

The “innovation <-> optimization continuum” is always at play. Companies often prioritize rapid innovation and speed to market, leveraging the cloud for its agility and flexibility. As they scale, they can then focus on optimizing those costs.

This increasing complexity underscores the growing importance of FinOps (Cloud Financial Operations), a cultural practice that brings financial accountability to the variable spend model of cloud, empowering teams to make data-driven decisions on cloud usage and optimize costs without sacrificing innovation.

Figma’s transparency in disclosing its cloud costs is actually a good thing. It forces a much-needed conversation about the true cost of running enterprise-scale infrastructure in 2025. The hyperbolic reactions, however, expose a fundamental misunderstanding of these realities. Which I also encountered with my Cosmos DB project in 2017.

So, the next time someone tells you that a company spending 12% of its revenue on infrastructure that literally runs its entire business is “doomed,” perhaps ask them how much they think it should cost to serve real-time collaborative experiences to 13 million users across the globe. The answer, if based on reality, might surprise them.

Lastly, as the cloud landscape continues to evolve, with new services, AI integration, and shifting geopolitical considerations, the core lesson remains: smart cloud investment isn’t about avoiding the bill, but understanding its true value in driving business outcomes and strategic advantage. The dialogue about cloud costs is far from over, but it’s time we grounded it in reality.

Digital Destiny: Navigating Europe’s Sovereignty Challenge

During my extensive career in IT, I’ve often seen how technology can both empower and entangle us. Today, Europe and the Netherlands find themselves at a crucial junction, navigating the complex landscape of digital sovereignty. Recent geopolitical shifts and the looming possibility of a “Trump II” presidency have only amplified our collective awareness: we cannot afford to be dependent on foreign legislation when it comes to our critical infrastructure.

In this post, I will delve into the threats and strategic risks that underpin this challenge. We’ll explore the initiatives being undertaken at both the European and Dutch levels, and crucially, what the major U.S. Hyperscalers are now bringing to the table in response.

The Digital Predicament: Threats to Our Autonomy

The digital revolution has certainly brought unprecedented benefits, not least through innovative Cloud Services that are transforming our economy and society. However, this advancement has also positioned Europe in a state of significant dependency. Approximately 80% of our digital infrastructure relies on foreign companies, primarily American cloud providers, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud. This reliance isn’t just a matter of convenience; it’s a strategic vulnerability.

The Legal Undercurrent: U.S. Legislation

One of the most persistent threats to European digital sovereignty stems from American legislation. The CLOUD Act (2018), an addition to the Freedom Act (2015) that replaced the Patriot Act (2001), grants American law enforcement and security services the power to request data from American cloud service providers, even if that data is stored abroad.

Think about it: if U.S. intelligence agencies can request data from powerhouses like AWS, Microsoft, or Google without your knowledge, what does this mean for European organizations that have placed their crown jewels there? This directly clashes with Europe’s stringent privacy regulations, the General Data Protection Regulation (GDPR), which sets strict requirements for the protection of personal data of individuals in the EU.

While the Dutch National Cyber Security Centre (NCSC) has stated that, in practice, the chance of the U.S. government requesting European data via the CLOUD Act has historically been minimal, they also acknowledge that this could change with recent geopolitical developments. The risk is present, even though it has rarely materialized thus far.

Geopolitics: The Digital Chessboard

Beyond legal frameworks, geopolitical developments pose a very real threat to our digital autonomy. Foreign governments may impose trade barriers and sanctions on Cloud Services. Imagine scenarios where tensions between major powers lead to access restrictions for essential Cloud Services. The European Union or even my country cannot afford to be a digital pawn in such a high-stakes game.

We’ve already seen these dynamics play out. In negotiations for a minerals deal with Ukraine, the White House reportedly made a phone call to stop the delivery of satellite images from Maxar Technologies, an American space company. These images were crucial for monitoring Russian troop movements and documenting war crimes.

Another stark example is the Microsoft-ICC incident, where Microsoft blocked access to email and Office 365 services for the chief prosecutor of the International Criminal Court in The Hague due to American sanctions. These incidents serve as powerful reminders of how critical external political pressures can be in impacting digital services.

Europe’s Response: A Collaborative Push for Sovereignty

Recognizing these challenges, both Europe and the Netherlands are actively pursuing initiatives to bolster digital autonomy. It’s also worth noting how major cloud providers are responding to these evolving demands.

European Ambitions:

The European Union has been a driving force behind initiatives to reinforce its digital independence:

  • Gaia-X: This ambitious European project aims to create a trustworthy and secure data infrastructure, fostering a federated system that connects existing European cloud providers and ensures compliance with European regulations, such as the General Data Protection Regulation (GDPR). It’s about creating a transparent and controlled framework.
  • Digital Markets Act (DMA) & Digital Services Act (DSA): These legislative acts aim to regulate the digital economy, fostering fairer competition and greater accountability from large online platforms.
  • Cloud and AI Development Act (proposed): This upcoming legislation seeks to ensure that strategic EU use cases can rely on sovereign cloud solutions, with the public sector acting as a crucial “anchor client.”
  • EuroStack: This broader initiative envisions Europe as a leader in digital sovereignty, building a comprehensive digital ecosystem from semiconductors to AI systems.

Crucially, we’re seeing tangible progress here. Virt8ra, a significant European initiative positioning itself as a major alternative to US-based cloud vendors, recently announced a substantial expansion of its federated infrastructure. The platform, which initially included Arsys, BIT, Gdańsk University of Technology, Infobip, IONOS, Kontron, MONDRAGON Corporation, and Oktawave, all coordinated by OpenNebula Systems, has now been joined by six new cloud service providers: ADI Data Center Euskadi, Clever Cloud, CloudFerro, OVHcloud, Scaleway, and Stackscale. This expansion is a clear indicator that the vision for a robust, distributed European cloud ecosystem is gaining significant traction.

Dutch Determination:

The Netherlands is equally committed to this journey:

  • Strategic Digital Autonomy and Government-Wide Cloud Policy: A coalition of Dutch organizations has developed a roadmap, proposing a three-layer model for government cloud policy that advocates for local storage of state secret data and autonomy requirements for sensitive government data.
  • Cloud Kootwijk: This initiative brings together local providers to develop viable alternatives to hyperscaler clouds, fostering homegrown digital infrastructure.
  • “Reprogram the Government” Initiative: This initiative advocates for a more robust and self-reliant digital government, pushing for IT procurement reforms and in-house expertise.
  • GPT-NL: A project to develop a Dutch language model, strengthening national strategic autonomy in AI and ensuring alignment with Dutch values.

Hyperscalers and the Sovereignty Landscape:

The growing demand for digital sovereignty has prompted significant responses from major cloud providers, demonstrating a recognition of European concerns:

  • AWS European Sovereign Cloud: AWS has announced key components of its independent European governance for the AWS European Sovereign Cloud.
  • Microsoft’s Five Digital Commitments: Microsoft recently outlined five significant digital commitments to deepen its investment and support for Europe’s technological landscape.

These efforts from hyperscalers highlight a critical balance. As industry analyst David Linthicum noted, while Europe’s drive for homegrown solutions is vital for data control, it also prompts questions about access to cutting-edge innovations. He stresses the importance of “striking the right balance” to ensure sovereignty efforts don’t inadvertently limit access to crucial capabilities that drive innovation.

However, despite these significant investments, skepticism persists. There is an ongoing debate within Europe regarding digital sovereignty and reliance on technology providers headquartered outside the European Union. Some in the community express doubts about how such companies can truly operate independently and prioritize European interests, with comments like, “Microsoft is going to do exactly what the US government tells them to do. Their proclamations are meaningless.” Others echo the sentiment that “European money should not flow to American pockets in such a way. Europe needs to become independent from American tech giants as a way forward.” This collective feedback highlights Europe’s ongoing effort to develop its own technological capabilities and reduce its reliance on non-European entities for critical digital infrastructure.

My perspective on this situation is that achieving true digital sovereignty for Europe is a complex and multifaceted endeavor, marked by both opportunities and challenges. While the commitments from global hyperscalers are significant and demonstrate a clear response to European demands, the underlying desire for independent, European-led solutions remains strong. It’s not about outright rejection of external providers, but about strategic autonomy – ensuring that we, as Europeans, maintain ultimate control over our digital destiny and critical data, irrespective of where the technology originates.

Azure Cosmos DB’s Latest Performance Features

As an earlier adopter of Azure Cosmos DB, I have always been following the developments of this service and have built up my experience myself with leveraging it for monitoring purposes (a recent one is presented at Azure Cosmos DB Conf 2023 – Leveraging Azure Cosmos DB for End-to-End Monitoring of Retail Processes).

Azure Cosmos DB

For those unfamiliar with Azure Cosmos DB, Microsoft’s globally distributed, multi-model database service offers low-latency, scalable storage and querying of diverse data types. It allows developers to build applications with data access and high availability across regions. Its well-known counterpart is Amazon DynamoDB.

In this blog post, I like to point out some recent optimizations of the service around performance. Moreover, I have written an InfoQ news item recently on this as well.

Priority-based execution

One of the more recent features introduced in the service is priority-based execution, which is currently in public preview.  It allows users to define the priority of requests sent to Azure Cosmos DB. When the number of requests surpasses the configured Request Units per second (RU/s) limit, lower-priority requests are slowed down to prioritize the processing of high-priority requests, as specified by the user’s defined priority.

As mentioned in a blog post by Microsoft, this feature empowers users to prioritize critical tasks over less crucial ones in situations where a container surpasses its configured request units per second (RU/s) capacity. Less important tasks are automatically retried by clients using an SDK with the specified retry policy until they can be successfully processed.

With priority-based execution, you have the flexibility to allocate varying priorities to workloads operating within the same container in your application. This proves beneficial in numerous scenarios, including prioritizing read, write, or query operations, as well as giving precedence to user actions over background tasks like bulk execution, stored procedures, and data ingestion/migration.

Once accepted, a nomination form is available to access the feature and .NET SDK.

Hierarchical Partition Keys

In addition to Priority-based execution, the product group for Cosmos DB also introduced Hierarchical Partition Keys to optimize performance.

Hierarchical partition keys enhance Cosmos DB’s elasticity, particularly in scenarios where users utilize synthetic- or logical partition keys surpassing 20 GB of data. By employing up to three keys with hierarchical partitioning, users can effectively sub-partition their data, achieving superior data distribution and enabling greater scalability. Azure Cosmos DB automatically distributes the data among physical partitions, allowing logical partition prefixes to exceed the 20GB storage limit.

According to the documentation, the simplest way to create a container and specify hierarchical partition keys is using the Azure portal. 

For example, you can use hierarchical partition keys to partition data by tenant ID and then by item ID. This way, all items for a given tenant are stored together in the same physical partition. This can improve query performance by reducing the number of physical partitions that need to be queried. 

A more detailed explanation and use case for hierarchical keys in Azure Cosmos DB can be found in the blog post by Leonard Lobel. 

Burst Capacity Feature

Lastly, the team also made the burst capacity feature for Azure Cosmos DB generally available (GA) to allow you to take advantage of your database or container’s idle throughput capacity to handle traffic spikes.   

Burst capacity allows each physical partition to accumulate up to 5 minutes of idle capacity, which can be utilized at a rate of up to 3000 RU/s. This feature is applicable to databases and containers utilizing manual or autoscale throughput, provided they have less than 3000 RU/s provisioned per physical partition.

To begin utilizing burst capacity, access the Features page within your Azure Cosmos DB account and enable the Burst Capacity feature. Please note that the feature may take approximately 15-20 minutes to become active once enabled.  

Enabling the burst capacity feature (Source: Microsoft Learn Bust Capacity)

According to the documentation, to use the feature, you need to consider the following: 

  • If your Azure Cosmos DB account is configured with provisioned throughput (manual or autoscale), burst capacity is not applicable. Burst capacity is specifically for serverless accounts.  
  • Additionally, burst capacity is compatible with Azure Cosmos DB accounts utilizing the API for NoSQL, Cassandra, Gremlin, MongoDB, or Table. 

Lastly, in case you are wondering what the difference between burst capacity and priority-based execution is, Jay Gordon, a Senior Cosmos DB program manager, explained that in the discussion of the blog post around these performance features:

The difference between burst capacity and execution based on priority lies in their impact on performance and resource allocation:

Burst capacity affects the overall throughput capacity of your Azure Cosmos DB container or database. It allows you to temporarily exceed the provisioned throughput to handle sudden spikes in workload. Burst capacity helps maintain low latency and prevent throttling during peak usage periods.

Execution based on priority determines the order in which requests are processed when multiple concurrent requests exist. Higher priority requests are prioritized and typically get faster access to resources for execution. This ensures that essential or time-sensitive operations are processed promptly, while lower-priority requests may experience slight delays.

“In terms of results, burst capacity and execution based on priority are independent. Utilizing burst capacity allows you to handle temporary workload spikes, whereas execution based on importance ensures that higher-priority requests are processed more promptly. These mechanisms work together to optimize performance and resource allocation in Azure Cosmos DB, but they serve different purposes“.

Conclusion

In conclusion, Azure Cosmos DB continues to evolve with new features designed to enhance performance and scalability. The priority-based execution, currently in public preview, enables users to prioritize critical tasks over less important ones when the request unit capacity is exceeded. This flexibility is further enhanced by introducing hierarchical partition keys, allowing optimal data distribution and larger scales in scenarios with substantial data. Additionally, the burst capacity feature, now generally available, provides an efficient way to handle traffic spikes by utilizing idle throughput capacity. Users can easily enable burst capacity through the Azure Cosmos DB account’s Features page, making it a valuable tool for serverless accounts.

Returning to Amazon, DynamoDB, the Cosmos DB counterpart on AWS, offers performance-optimizing capabilities. Concepts are similar.

New Pricing Plan and Enhanced Networking for Azure Container Apps in Preview

Microsoft recently announced a new pricing plan and enhanced networking for Azure Container Apps in public preview.

Azure Container Apps is a fully managed environment that enables developers to run microservices and containerized applications on a serverless platform. It is flexible and can execute application code packaged in any container without runtime or programming model restrictions.

Earlier Azure Container Apps had a consumption plan featuring a serverless architecture that allows applications to scale in and out on demand. Applications can scale to zero, and users only pay for running apps.

In addition to the consumption plan, Azure Container Apps now supports a dedicated plan, which guarantees single tenancy and specialized compute options, including memory-optimized choices. It runs in the same Azure Container Apps environment as the serverless Consumption plan and is referred to as the Consumption + Dedicated plan structure. This structure is in preview.

Mike Morton, a Senior Program Manager at Microsoft, explains in a Tech Community blog post the benefit of the new plan:

It allows apps or microservice components that may have different resource requirements depending on component purpose or development stack to run in the same Azure Container Apps environment. An Azure Container Apps environment provides an execution, isolation, and observability boundary that allows apps within it to easily call other apps in the environment, as well as provide a single place to view logs from all apps.

At the Azure Container Apps environment scope, compute options are workload profiles. The default workload profile for each environment is a serverless, general-purpose profile available as part of the Consumption plan. For the dedicated workload profile, users can select type and size, deploy multiple apps into the profile, use autoscaling to add and remove nodes and limit the scaling of the profile.

Source: https://techcommunity.microsoft.com/t5/apps-on-azure-blog/azure-container-apps-announces-new-pricing-plan-and-enhanced/ba-p/3790723

With Container Apps, one architect has another compute option in Azure besides App Service and Virtual Machines. Edwin Michiels, a Tech Customer Success Manager at Microsoft, answered in a LinkedIn post the difference between Azure Container Apps and Azure Apps Service, which offer similar capabilities:

In terms of cost, Azure App Service has a pricing model based on the number of instances and resources used, while Azure Container Instances and Azure Kubernetes Service are billed based on the number of containers and nodes used, respectively. For small to medium-sized APIs, Azure App Service may be a more cost-effective option, while for larger or more complex APIs, Azure Container Instances or Azure Kubernetes Service may offer more flexibility and cost savings.

The Consumption + Dedicated plan structure also includes optimized network architecture and security features that offer reduced subnet size requirements with a new /27 minimum, support for Azure Container Apps environments on subnets with locked-down network security groups and user-defined routes (UDR), and support on subnets configured with Azure Firewall or third-party network appliances.

The new pricing plan and enhanced networking for Azure Container Apps are available in the North Central US, North Europe, West Europe, and East US regions. Billing for Consumption and Dedicated plans is detailed on the Azure Container Apps pricing page.

Lastly, the new price plan and network enhancements are discussed and demoed in the latest Azure Container Apps Community Standup.

My Azure Security Journey so far

I like to travel, explore and admire new environments. Similarly, in my day-to-day job, I want to explore new technologies, look at architectural challenges with the solutions I design, and help engineers.

Exploring is my second nature; it’s my curiosity and desire to learn – experience new things. With Cloud Computing, many developments happen daily, including new services, updates, and learnings. I like that, and with my role at InfoQ, I can cover these developments through news stories. Moreover, in my day job, I deal with cloud computing daily, specifically Microsoft Azure and integrating systems through Integration Services.

Exams

An area that got my attention this year was governance and security.  I wrote two blogs this year – a blog on secret management in the cloud and one titled a high-level view of governance. In addition, I started exploring resources from Microsoft on Governance and Security on their learning platform. And recently, I planned to prepare for some certifications for that matter with:

  • Exam SC-900: Microsoft Security, Compliance, and Identity Fundamentals
  • Exam AZ-500: Microsoft Azure Security Technologies
  • Exam SC-100: Microsoft Cybersecurity Architect

I passed the first, and the other two are scheduled for Q1 in 2023.

The goal of preparing for the exams is learning more about security, as its an important aspect when designing integration solutions in Azure.

Screenshot showing security design areas.

Source: https://learn.microsoft.com/en-us/azure/architecture/framework/security/overview

Another good source is the well-architected framework: Security Pillar.

New Items

The dominant three public cloud providers, Microsoft, AWS, and Google, provide services and guidance on security on their platforms. As a cloud editor at InfoQ, I sometimes cover stories on their products, open-source initiatives, and architecture. Here’s a list of security and governance-related news items I wrote in 2022:

Source: https://github.com/ine-labs/AzureGoat#module-1

Books

Next to writing news items, my day-to-day job, traveling, and sometimes running, I read books. The security-related books I read and am reading are:

Another one I might get is a recent book published by APress titled: Azure Security For Critical Workloads: Implementing Modern Security Controls for Authentication, Authorization, and Auditing by Sagar Lad.

Microsoft Valuable Professional Security

Another thing I recently learned is that there is a new award category within the MVP program: Azure Security. The focus for this area lies on contributions in:

  • Cloud Security in general on Azure, think about Microsoft Azure services like Key Vault, Firewall, Policy, and concepts like Zero Trust Model and Defense in Depth.
  • Identity & Access, including management, hence Azure Active Directory (AAD) or, in general, Microsoft Entra.
  • Security Information and Event Management (SIEM) & Extended Detection and Response (XDR) – think about Microsoft’s product Sentinel.

Lastly, I am looking forward to 2023, which will bring me new challenges, destinations to travel to, and hopefully, success in passing the exams I have lined up for myself.

The value of having a Third-party Monitoring solution for Azure Integration Services

My day-to-day job focuses on enterprise integration between systems in the cloud and/or on-premises. Currently, it involves integration with D365 Finance and Operations (or Finance and Supply Change Management). One aspect of the integrations is monitoring. When a business has one or more Azure Integration Service running in production, the operation aspect comes into play. Especially integrations that support crucial business processes. The operations team requires the correct procedures, tools, and notifications (alerts) to run these processes. Procedures and receiving notifications are essential; however, team members need help identifying issues and troubleshooting. Azure provides tools, and so do third-party solutions. This blog post will discuss the value of having third-party monitoring in place, such as Serverless360.

Serverless360

Many of you who read blogs on Serverless360 know what the tool is. Moreover, it is a service hosted as a Software as a Service (SaaS). Therefore, operation teams can require access once a subscription is acquired or through a trial. Subsequently, they can leverage the primary business application, business activity monitoring, and documenter feature within the service. We will briefly discuss each feature and its benefits and value in the upcoming paragraphs.

Business Applications

A team can configure, and group integration components with the business applications feature a so-called “Business Application” to monitor. It does not matter where the resources reside – within one or more subscriptions/resource groups.

Business Application

The overview shown above is the grouping of several resources belonging to an integration solution. In one blink of an eye, a team member of the operations team can see the components’ state and potential issues that need to be addressed. Can the same be done in Azure with available features such as Azure Monitor, including components like Application Insights? Yes, it can be done. However, it takes time to build a dashboard. Furthermore, when operations are divided into multiple tiers, first-tier support professionals might not be familiar with the Azure Portal. In a nutshell, an overview provided by Business Application is not present in Azure out-of-the-box.

As Lex Hegt, Lead Product Consultant at BizTalk360, points out:

Integration solutions can span multiple technologies, resource groups, tags, and even Azure subscriptions. With the Azure portal having the involved components in all those different places, it is hard to keep track of the well-being of those components. Serverless360 helps you utilize the concept of Business Applications. A Business Application is a container to which you can add all the components that belong to the same integration. Once you have added your components to a Business Application, you can set up monitoring for those components, provide access permissions, and administer them.

The Business Application brings another feature that provides an overview of the integration components and dependencies. You might be familiar with the service map feature in Application Insights on a more fine-grained level. The service map in Serverless360 is intended to show the state of each component and dependency on a higher level.

Within a business application, the configuration of monitoring components is straightforward. By selecting the component and choosing the monitoring section, you can set thresholds of performance counters and set the state.

Performance Counters

The value of Business Applications is a quick view of the integrations state and the ability to dive into any issue quickly, leading to time-saving by spending far less time identifying the problem (see, for instance, Application Insights health check with Serverless360, and Integrating Log Analytics in Serverless360). With more time on their hand’s operations teams can focus on various other matters during a workday or shift. Furthermore, the ease of use of Business Applications doesn’t require support people in a first-tier support team to have a clear understanding and experience of the Azure portal.

Having a clear overview is one thing. However, it also helps operations teams get notifications or finetune metrics based on thresholds and only receive information when it matters. In addition, it’s essential to keep integrations operational when they support critical business processes, as any outage costs a significant amount of money.

Business Activity Monitoring

The second feature of Serverless360 is the end-to-end tracking capability called Business Activity Monitoring (BAM). The BAM feature organization can instrument their Azure resources that support integrations between systems. Through a custom connector and SDK, you can add tracking to Logic Apps and Azure Functions that are a part of your integration. A unique generated transaction instance-id in the first component will be carried forward to the subsequent stages in more functions and Logic Apps.

The operations team must do some work to leverage the BAM functionality. They need to set up the hosting of the BAM infrastructure, define the business process, instrument the business process and add monitoring (see, for instance, Azure Service Bus Logging with BAM – Walk-through). Once that is done, a clear view of the process and its stages are available.

Business Activitity Monitoring (BAM)

The benefit of the BAM feature is a concise overview of the configured business processes. Moreover, you get an overview of the complete process and potentially see where things go wrong.

Azure Documenter

The final feature Serverless360 offers the Azure Documenter is intended to generate documentation. Operations teams can generate documentation for the subscription that contains the integrations with the documenter. It is good to have a dedicated subscription for integration solutions to govern better and manage Azure resources.

When operations teams like to generate documentation, they can choose between different templates, storing of the document, and billing range.

Azure Documenter

The benefit of having documentation of the integrations in a subscription is having a clear overview of the components, details, and costs (consumption). While the Azure portal offers a similar capability, you will have to go to the Cost management and billing to see consumption and cost, Azure Advisor, and other places. Furthermore, there is no feature to generate documentation to help report the Azure resources’ state.

Report Azure Documenter

The value of the Azure Documenter is the flexibility for generating documentation on a different level of granularity. Furthermore, by frequently running the documenter, you can spot differences like an unexpected increase in cost provide executive reports and information for your knowledge base for integrations.

Conclusion

Features and benefits of Serverless360 have been outlined in this blog post. Of course, there are many more features. Yet, we focused on the most significant one that provides Operations teams the most value. That is a clear overview of the state of integrations in a single-pane-of-glass and the ability to quickly drill down into integration components and spot issues at a fine-grained level. Furthermore, Business Activity Monitoring and Azure Documenter provide end-to-end tracking and generated documentation.

Serverless 360 Monitoring

Serverless360 offers an off-the-shelf product for monitoring not directly available in the Azure Portal. As an organization, you can decide whether to buy a product or build a custom solution, or both to fulfill monitoring requirements for integration solutions. Serverless360 can be the solution for organizations looking for a product to meet their needs. It has unique features which are not directly available in Azure or require a substantial investment to achieve.

For more details and Serverless360 in action, see the webinar of Michael Stephenson: Support Strategy for Event-Driven Integration Architectures and the latest features blog.

Should developers care about Azure Cost?

The days of prepurchasing a large amount of infrastructure are gone. Instead, in the Cloud, we deal with buying small units of resources at a low cost. As a result, developers have the freedom to provision resources and deploy their apps. They can spend company money at a click of a button or line of code. There is no longer a need to go through any procurement process.

Therefore you could ask the question: Should developers be aware of the running costs of their apps and belonging infrastructure? And also worry about SKU’s, dimensioning, and unattended resources? I would say yes, they should be aware. Depending on requirements, environments (dev, test, acceptance, and production), availability, security, test strategy, and so on, costs will accumulate. Having an eye on the cost from the start will prevent discussion when the bill is too high at the end of the month or lacks justifying of the chosen deployment of Azure resources. 

Fortunately, there are services and tools available to help you in the estimation of costs, monitoring, and analysis for cost optimization. Furthermore, you can help identify costs by applying tags to your Azure resources – important when costs of Azure resources in a subscription are shared over departments.

Azure Calculator

Microsoft provides a Cloud Platform called Azure containing over 100 services for its customers. They are charged for most of the services when consuming them. These charges (cost) can be estimated using the so-called ‘Pricing calculator.’

You can search for a product (service) with the pricing calculator and subsequently select it.

Azure Price Calculator

Next, a pop window on the right-hand side will appear, and you click on view. Finally, a window will appear with the options for, in this case, Logic Apps. You can select the region where you like to provision your product (service), and depending on hosting, other criteria specify what you like to consume. In addition, you can select what type of support you want and licensing model – and there is also a switch allowing you to see what the dev/test pricing is for the product.

Furthermore, if you want to estimate a solution consisting of multiple products, you can select all of them before specifying the consumption characteristics. The calculator will, in the end, show the accumulated costs for all products.

Other tabs in the calculator showcase sample scenarios to calculate the cost potential savings when already running resources in Azure and FAQs. And lastly, at the bottom, you can click purchasing options for the product(s).

More details of Azure pricing are available on the pricing landing page.

Considerations Cost Calculator

An Azure calculator is a tool for estimating and not actual costs generated by a client when using the products. It depends on the workload, the number of environments, sizing, and support costs (not just from Microsoft itself, yet also the cost of those managing the product from the client-side). Using the tool can be a good starting point to provide the client a feeling of the cost generation of potential workloads that run on the platform. Furthermore, you can also use the tool to perform an overall calculation by including multiple environments, sizing, and support leveraging Excel. In addition, there is also a TCO calculator through the Azure pricing landing page.

Cost Management

The cost management + billing service and features are available in any subscription in the Azure portal. It will allow you to do administrative tasks around billing, set spending thresholds, and proactively analyze azure cost generation. For example, in the Azure Portal, under Cost Management and Billing, you can find Budgets to create a budget for your costs in your subscription. In the create budget, you can define thresholds on actual and forecasted costs, manage an action group, specify emails (recipients for alerts) and language.

Azure Cost Management Budgets

Considerations Cost Management

A key aspect regarding cost control is to set up budgets (mentioned earlier) at the beginning once a subscription before workloads land or resources are provisioned to develop cloud solutions. Furthermore, once consumption of Azure resources starts, you can look at recommendations for cost optimizations and Costs Analysis. For instance, the cost analysis (preview) can show the cost per resource group and services.

Azure Cost Analysis

It is recommended to separate workloads per subscription as per the subscription decision guide. And one of the benefits is splitting out costs and keeping them under control with budgets. And lastly, Azure Advisor can help identify underutilized or unused resources to be optimized or shut down.

Tagging

Tagging Azure resources is a good practice. A tag is a key-value pair and is helpful to identify your resource. You can order your resource with, for instance, a key environment and value dev (development) and a key identifying the department with value marketing. Moreover, you can add various tags (key/values), up to 50. Each tag name (key) is limited to 512 characters and values to 256 characters. More information on limitations is available on the Microsoft docs.

Tagging Considerations

With tags, you can assign helpful information to any resource within your cloud infrastructure – usually information not included in the name of available in the overview of the resource. Tagging is critical for cost management, operations, and management of resources. More details on how to apply them are available in the decision guide. Furthermore, you can enforce tagging through Azure policies – see the Microsoft documentation on policy definitions for tag compliance.

Reporting

Stakeholders in Azure projects will be interested in cost accumulation for workloads in subscriptions. Therefore, reports of resource consumption in the euro, for example, are required. These reports can be viewed in the Azure Portal under Cost Management and Billing. However, you will need filters in the cost analysis or use the preview functionality to be more specific. Or you can export the data to a storage account and hook it up to PowerBI, or use third-party tooling like CloudCtrl.

Cloud Control

And finally, as a developer, you can also leverage the available APIs to get costs and usage data. For example, the Azure Consumption APIs give you programmatic access to cost and usage data for your Azure resources. With the data, you can build reports.

Reports considerations

With costs, reports are essential to realize who the target audience is, what information they are looking for and how to present it. In addition, each active resource consumes the Azure infrastructure inside a data center, leading to cost. And cost should represent value in the end. Hence, reporting is critical for stakeholders in your cloud projects. The analysis of costs is in good hands with the cost analysis capabilities; however, the presentation requirements might differ and sometimes require a custom report by leveraging, for instance, PowerBI or a third-party tool.

Wrap up

In this blog post, we discussed Azure cost and hopefully made it clear that developers should care about cost, and they have tools and services available to make life easier. For example, they can set up cost management infrastructure themselves in their dev/test subscriptions if not already enforced or done by IT. Furthermore, they can make IT and the architect(s) aware of it if it is not in place. In the end, I believe it is a shared responsibility of developers and IT responsible for managing the Azure environments/subscriptions.

Secret Management in the Cloud

I have been using Azure Key Vault for secret management for the last two or three years in my projects or advice my peers, client, and colleagues I work with to do so. Azure Key Vault is a service that provides storing and managing secrets with policies and the ability to access them using .NET code. Moreover, it is not just .NET yet also a service principal that can access it to get a secret for establishing a connection or a pipeline. The secrets can be API keys, connection strings, credentials, certificates, etc. I like to discuss a secret management use case in this blog post and dive into its details.

Use case Key Vault and D365 FO Business Events

In a recent project regarding unlocking data from a Dynamics 365 Finance and Operations (FO) instance, I leveraged the concept of Business Events, where a Logic App subscribes to a specific event published on a custom Event Grid Topic. Let me further explain the scenario and where Key Vault comes into play. Below you see a diagram of integration between D365 FO and third party system. The latter receives data from D365 based upon a specific business event.

D365 FO Business Events

Within D365 FO, you can define a destination for a business event. As shown in the diagram, the destination is an Event Grid Topic. When following the Microsoft documentation of Business Events and Event Grid, you will notice that a Key Vault is required to keep the access key of the Event Grid Topic as a secret. Furthermore, you will need to create a so-called App registration in

Azure Active Directory. Azure App registrations are a simple and effective way to configure authentication and authorization workflows for many client types. In this case, a client identifying D365 – allowing access to the Key Vault instance to extract the access key for the custom Event Grid Topic.

Once the app registration is in place, the next step is to add it to the access policies in the Key Vault instance. The registration represents D365, and it needs access to the Key Vault to extract the access key for the Azure Event Grid topic. The app registration only requires the Get and List secret permissions to retrieve the Key Vault secrets.

The endpoint configuration is the next step when the app registration and policy are in place, the custom Event Grid topic is available, and its access key is a secret in Key Vault. The screenshot below shows the configuration of an actual endpoint (destination) for the events – the custom Event Grid topic.

Business Event Endpoint Configuration

For configuring the endpoint (destination), you need to provide a name. So first, the endpoint type is filled in by default, followed by the endpoint URL (destination endpoint – Event Grid topic URL) and then the details for the Key Vault. These details are the client id of the app registration, its secret, the DNS name of the Key Vault instance, and key vault secret name – which has the secret, i.e., access key to the custom Event Grid topic. And finally, you can press Ok for the creation of the endpoint. You can subsequently attach the endpoint to the necessary business event and activate it when the endpoint is created.

Once the endpoint is active and a specific business event is attached to the endpoint, the event will end up with the subscriber – Logic App. An example of a business event is shown below:

{

  “BusinessEventId”: “PurchaseOrderConfirmedBusinessEvent”,

  “ControlNumber”: 5637365024,

  “EventId”: “9D42A382-12E8-48F6-9BB2-29A1G4E39773”,

  “EventTime”: “/Date(1642759229000)/”,

  “LegalEntity”: “fnl1”,

  “MajorVersion”: 0,

  “MinorVersion”: 0,

  “PurchaseJournal”: “PO1-002342-11”,

  “PurchaseOrderDate”: “/Date(1642723200000)/”,

  “PurchaseOrderNumber”: “PO1-002342”,

  “PurchaseType”: “Purch”,

  “TransactionCurrencyAmount”: 1553.46,

  “TransactionCurrencyCode”: “EUR”,

  “VendorAccount”: “IFF1095”

}

The Logic App can use the details to retrieve more information (through OData calls) about the purchase order in this case. And as shown in the diagram, send the enriched json to a service bus queue to handover the another Logic App to transform it into an XML to be sent to an application Basware (provider of software for financial processes, purchase to pay, and financial management).

Managing Key Vault

To properly set up the process around Key Vault and secrets, the administrator (Azure Ops) is responsible for creating the app registration. The administrator will make the app registration and manage the Key Vault. Moreover, the person is also the one in my view that does the endpoint configuration. Therefore, the integration developer will only need to connect the Logic App to the Event Grid topic. Similarly, the SFTP connection requiring credentials or certificates can also leverage the Key Vault and require the same administrator.

The diagram below shows what the administrator can do regarding the app registration and managing the Key Vault instance. Also, the authentication process is shown from the application side – in our case, creating the endpoint from D365. Finally, D365 will use the app registration to authenticate against Azure AD to retrieve a token necessary to access the key vault secret.

Key Vault Management

I like to point here regarding this scenario that business events might need to be set up again when a database refresh is done. Note that when the endpoint configuration fails, you can see an error like:

Unable to get secret from Key Vault DNS: <dns of the key vault instance> Secret name: <name secret>

In that case, either the app registration client id or secret is wrong, or worse, the app registration is expired (the error messages will not tell you that!). An app registration expires (the max is two years). Hence, be aware that the events when the app registration is expired will not reach the Event Grid topic, and errors will occur on the D365 side. Therefore, I recommend monitoring the expiration for the app registration, and also, the secrets can have an expiry date – so keep an eye on that too!

Other Cloud Public Cloud Providers

Interestingly, Azure is not the only public cloud platform with a secret certificate and key management service. For example, AWS actually has three services – AWS Secrets Management, AWS Certificate Manager, and AWS CloudHSM. With AWS Secrets Manager, users can manage access to secrets using a fine-grained set of policies, control the lifecycle of secrets, and secure and audit secrets centrally. Furthermore, this is a managed service with a pay-as-you-go model available in most AWS regions. Sound familiar? Azure Key Vault is similar, right? Almost, yet Key Vault has most of the capabilities found in the three earlier mentioned AWS Services.

What about the Google Cloud Platform? Well, on GCP, you will find Secret Manager, which also enables users to store and manage secrets, including policies and rotation. Furthermore, the service offers management of certificates. And lastly, the public cloud has a separate service for key management with Key Management Service (KMS).