A High-Level View of Cloud Governance

Something that intrigues me in the cloud is governance. As a technical integration architect, that’s the role/function I have in my current day-to-day job. Yet, during designing solutions, I usually do not think about it or talk to a customer set on moving to the cloud – that’s a cloud migration process, which I am generally not involved with. Still, it should have my attention, and it has now.

You might ask if it sounds unfamiliar to you, what is governance? First, you could look up the term in Wikipedia. And you’ll find the explanation or definition in the first lines mentioning a process of interactions through laws, norms, power, or language of an organized society over a social system such as tribe, family, formal or informal organization. Yet how does this relate to the cloud? Well, very simple, it is still a process of interactions, however, defined by what a cloud provider deems necessary to keep costs, access to data, consistency, and deployments under control.

A Cloud provider like Microsoft, AWS, and Google can provide you with guidance regarding governance to manage costs, secure resources and access to data, and consistency in the deployment of resources – each provides frameworks for that:

The Google Adoption Framework whitepaper will mention governance regarding data, cost control, security, and cloud resources management. While AWS CAF has governance as one of its six perspectives. And Microsoft has a section of Govern in their Framework and a landing page.

Microsoft Cloud Adoption Framework

Source: https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/overview

I now like to zoom further into governance on Microsoft Azure since I predominantly work as a (solution) architect (integration) on that Cloud platform. Furthermore, I will not look at the process extensively described in the CAF, yet more on some of the services and capabilities available in Azure and add some of my views and relevant resources I found.

Azure Resources

Microsoft provides policies on Azure to allow you to keep resources compliant. When a policy is assigned, it will, when it is triggered, evaluate if it adheres to a definition. You can use these policies to implement governance for resource consistency, regulatory compliance, security, cost, and management. For more details on Azure Policies, see Azure Policy on GitHub.

Next to policies tagging is another aspect of governance in Azure or any cloud platform. With tags, you can assign helpful information to any resource within your cloud infrastructure – usually information not included in the name of available in the overview of the resource. Tagging is critical for cost management, operations, and management of resources. More details on how to apply them are available in the decision guide.

If you work at a company with many subscriptions, or the customer you work for does, you can leverage management groups –a level of scope above subscriptions. It provides a way to organize subscriptions into containers and thus provide a logical structure. Moreover, you can apply specific governance conditions with management groups as each subscription in a group inherits them.

Diagram of a sample management group hierarchy.

Source: https://docs.microsoft.com/en-us/azure/governance/management-groups/overview

More details on management groups are available on the GitHub page.

Another intriguing service is the Azure Resource Graph, a capability in Azure to query, explore, and analyze your cloud resources. It includes an Explorer you can use in the Azure portal and can also be used programmatically through the Azure CLI, Azure PowerShell and Azure SDK for .NET.

You can use Graph Explorer to explore resources based on your governance requirements and assess the impact of applying policies in your environments. The query language is based on the Kusto query language used by Azure Data Explorer. More details are available on the GitHub page.

And lastly, Azure Blueprints can enable you to define a repeatable set of Azure resources that implements and adheres to an organization’s standards, patterns, and requirements. As a result, you can orchestrate the deployment of various resource templates and other artifacts such as the earlier mentioned policies, role assignments, ARM templates, and resource groups in a declarative way. With blueprints, you can consistently deploy predefined environments. Other public cloud providers offer blueprints as well: AWS Blueprints and GCP Blueprints. You can find more details on Azure blueprints on GitHub.

Cost Management

The cost management + billing service and features are available in any subscription in the Azure portal. It will allow you to do administrative tasks around billing, set spending thresholds, and proactively analyze azure cost generation. A key aspect is regarding cost control is to set up budgets at the beginning once a subscription before workloads land or resources are provisioned for the development of cloud solutions. Furthermore, once consumption of Azure resources starts, you can look at recommendations for cost optimizations. Moreover, Azure Advisor can help identify underutilized or unused resources to be optimized or shut down.

Example of the Subscription Overview tab showing Offer and Offer ID

Source: https://docs.microsoft.com/en-us/azure/cost-management-billing/costs/understand-cost-mgt-data

Security

An essential aspect of governance is security, for example, who gets access to what resource in Azure. A consistent way to set that up is by applying the earlier mentioned blueprint. Azure AD plays a role as well when you add accounts, service principles (an identity created for use with applications, hosted services, and automated tools to access Azure resources – similar to a service account on Windows), and app registrations (Application Object).

Azure AD is an Identity and Access solution with several features, such as conditional access, Multi-Factor Authentication (MFA), and Singel-SignOn (SSO) support. In addition, it is an essential service with regards to governance to provide access to the application (services) and people to Azure resources – and you want that consistent and accurate when it comes to who is responsible for what. And lastly, Microsoft provides best practices and guidance on this service you can look into.

Data Governance

Microsoft launched Purview into a public preview for data governance in December 2020, and it became generally available later in October 2021. With Azure Purview, the company delivers an Azure service that can help you understand what data your company has and provide means to manage the data’s compliance with privacy regulations and derive valuable insights.

Implementing Pipes and Filters Pattern using Azure Integrations Services

In my first blog post, I like to discuss implementing the pipes and filter pattern using Azure Integration Services (AIS).

A Pipe and Filter pattern uses multiple event queues and topics to streamline events’ flow across numerous cloud services. You can implement this by using queues or topics in a service bus namespace and Logic Apps or Azure Functions. Implementing this pattern with the mentioned services allows you to process messages in multiple stages like receiving, validating, processing, and delivering. Moreover, you can also opt for an event-driven approach using Logic Apps, Event Grid, and Functions.

The Pipe and Filter pattern is described in the Microsoft docs as a way to decompose complex processing into a series of separate elements that can be reused, providing improved performance, scalability, and reusability. Each element in the context of Azure Integration Services (AIS) can be either a Logic App or Azure Function and connected through a topic – which can be either a Service Bus Topic or an Event Grid Topic. The latter allows an event-driven approach to the pattern.

To process messages more timely, choosing the Event Grid is more efficient than using a service bus queue. Although you can select the queue or topic as a pipe with the pipe and filter pattern, having each filter subscribe and publish to them. However, it is less efficient as you will need to poll the queue or topic quite frequently.

Assume you receive multiple order files as a batch during the day, and these need to be processed as soon as possible in the shortest amount of time. You would need a way to receive or pick the files, validate them, process them, and subsequently deliver them to one or more sub-systems or locations. In the diagram below, we outlined the scenario.

Pipes and filter pattern implementation diagram

In the scenario, the Logic App function as on- and off-ramp receiving and sending data. The benefit of leveraging Logic Apps for data transport is its versatility in connectors. By leveraging the out-of-the-box connectors, you can consistently use a standard connector, in this case, Secure File Transport (SFTP).

Next, the functions will act as single processing units, i.e., one will store the data, including adding instance properties (context). Finally, another is triggered because the files are stored as a blob in a storage container (blobCreated Event). The chaining of the functions is done through the event mechanism – each storing of a blob results in an event (blobCreated) that another function can subscribe to. And lastly, a Logic App can be triggered to deliver the processed file. Moreover, multiple Logic Apps could be activated to deliver the file to various locations using various connectors if necessary.

The benefit of using functions is that you can have them scaled automatically (and thus also have elasticity) using Serverless mode (consumption) as a hosting option. Or, if you need more compute you can choose premium or dedicated plans. And by adding more compute, the performance of processing the files can be increased.

With the implementation of the pipes and filter pattern described above, you can see that you could easily reuse components (functions), add or remove components or shift around if necessary – hence you also have flexibility (agility). As described in the enterprise integration pipe and filter pattern with the context of the implementation:

Each filter (Function) exposes a straightforward interface (Binding): it receives messages on the inbound pipe (Event Grid Topic), processes the message, and publishes the results to the outbound pipe (Storage Container). The pipe (Event Grid Topic) connects one filter to the next, sending output messages from one filter to the next. Because all components use the same external interface, they can be composed into different solutions by connecting the components to other pipes. We can add new filters, omit existing ones or rearrange them into a new sequence — all without changing the filters themselves. The connection between filter and pipe is sometimes called port. Each filter component has one input port (Event Grid Topic) and one output port (Storage Account) in the basic form.

To summarize, the pipe and filter pattern brings the following benefits:

  • loose and flexible coupling of components (filters)
  • flexibility as it allows filters to be changed without modifications to other filters
  • parallel processing
  • reusability as each component (filter) can be called and used repeatedly

Note that it is less suitable when there are too many components (filters) concerning performance. Individually, the components – functions in our example can be tuned using the available hosting options – however, the sum of all the components determines the overall performance. Furthermore, the pattern is not suitable for interactive systems or long-running processes.

And lastly, with the pipe and filter pattern, there are also some challenges or considerations when implementing it according to the Microsoft documentation. For example, it mentions complexity when filters are distributed across servers. However, in our example, every component is presumably in the same Azure region, and messages are persisted in storage, benefiting from the built-in redundancy (reliability is available as well).

Finally, again in our example, when a message fails to reprocess, it is an option to reprocess it (retry); however, it depends on whether the message has to be reprocessed or dropped in a different container in the storage account. And when that happens, it is up to how monitoring and notifications are set up to troubleshoot and correct the error.