About steefjan1970

Steef-Jan Wiggers works in the Netherlands as a Technical Integration Architect at HSO and is one of InfoQ's senior cloud editors. His current technical expertise focuses on integration platform implementations, Azure DevOps, and Cloud Solution Architectures. Steef-Jan is a board member of the Dutch Azure User Group, a regular speaker at conferences and user groups, and he writes for InfoQ. Furthermore, Microsoft has recognized him as Microsoft Azure MVP for the past fourteen years. Steef-Jan can be found on twitter at @SteefJan.

Microsoft Foundry Citadel Platform Azure: Conversation Persistence with Cosmos DB

Posted on July 6, 2026 by steefjan1970

In the previous post, we connected a real tool-calling agent to the Microsoft Foundry Citadel Platform on Azure, routing every LLM call through the APIM governance hub in Sweden Central. The agent answered weather questions; the hub captured usage events in Cosmos DB; and Application Insights confirmed that both LLM calls were governed. The agent worked, but it had no memory. Every run started fresh, with no record of what was asked or answered.

This post adds conversation persistence to the Microsoft Foundry Citadel Platform on Azure. Every agent run now produces a structured document in the spoke’s Cosmos DB conversations container: the user’s question, the tool call made, the tool result, the agent’s answer, token counts, model version, and timestamp. The agent gains a memory layer, marking the transition as the spoke’s data tier becomes active.

What We Build

Each agent run writes one document to the spoke Cosmos DB:

			
{
  "id": "run-20260625-143022-stockholm",
  "principal_id": "steefjan@msn.com",
  "timestamp": "2026-06-25T14:30:22.441Z",
  "question": "What is the weather like in Stockholm right now?",
  "tool_calls": [
    {
      "name": "get_weather",
      "arguments": {"location": "Stockholm"},
      "result": {
        "location": "Stockholm, Sweden",
        "temperature_celsius": 22.5,
        "wind_speed_kmh": 7.2,
        "condition": "Overcast"
      }
    }
  ],
  "answer": "The weather in Stockholm is overcast with a temperature of 22.5°C...",
  "model": "gpt-4o-2024-11-20",
  "prompt_tokens": 234,
  "completion_tokens": 67,
  "total_tokens": 301,
  "apim_gateway": "apim-wpvlimv4ngkns.azure-api.net"
}

		

The partition key is /principal_id matching the container definition deployed by the spoke Bicep template. In addition, this arrangement ensures that all conversations for a given user are grouped into the same logical partition, making per-user history queries efficient.

The agent writes the document after completing the run, so a failed or incomplete run leaves no record.Moreover, it’s clean, simple, and auditable.

Prerequisites

From the previous two posts you should have:

Hub deployed in rg-ai-hub-gateway-dev
Spoke deployed in rg-ai-spoke-dev with Cosmos DB cosmos-tggi2gmkw22w4, database cosmos-dbtggi2gmkw22w4, container conversations
App Config appcs-tggi2gmkw22w4 populated with COSMOS_DB_ENDPOINT and CONVERSATIONS_DATABASE_CONTAINER
agent.py, config.py, and tools.py from the previous post
Virtual environment activated with openai, azure-appconfiguration, azure-identity, and requests installed

Step 1 — Install the Cosmos DB SDK

With your virtual environment activated:

pip install azure-cosmos

Pitfall: Cosmos DB Public Network Access

If your Cosmos DB has firewall rules enabled (which the spoke Bicep template sets by default), your local IP needs to be in the allowed list or public access needs to be set to All networks for dev. Check via the portal: cosmos-tggi2gmkw22w4 (your instance) → Networking → Public access → All networks → Save. In production this would be networkIsolation=true with private endpoints only.

Step 2 — Extend Config to Read Cosmos DB Settings

The spoke App Config already contains COSMOS_DB_ENDPOINT and CONVERSATIONS_DATABASE_CONTAINER — populated automatically during deployment. Extend config.py to pull these:

			
$lines = @(
    "from azure.appconfiguration import AzureAppConfigurationClient",
    "from azure.identity import DefaultAzureCredential",
    "",
    "APP_CONFIG_ENDPOINT = 'https://appcs-tggi2gmkw22w4.azconfig.io'",
    "LABEL = 'ai-lz'",
    "",
    "def get_config() -> dict:",
    "    credential = DefaultAzureCredential()",
    "    client = AzureAppConfigurationClient(",
    "        base_url=APP_CONFIG_ENDPOINT,",
    "        credential=credential",
    "    )",
    "    keys = [",
    "        'AI_FOUNDRY_PROJECT_ENDPOINT',",
    "        'CHAT_DEPLOYMENT_NAME',",
    "        'APIM_GATEWAY_URL',",
    "        'APIM_SUBSCRIPTION_KEY',",
    "        'COSMOS_DB_ENDPOINT',",
    "        'CONVERSATIONS_DATABASE_CONTAINER',",
    "        'DATABASE_NAME',",
    "    ]",
    "    config = {}",
    "    for key in keys:",
    "        setting = client.get_configuration_setting(key=key, label=LABEL)",
    "        config[key] = setting.value",
    "    return config",
    "",
    "if __name__ == '__main__':",
    "    cfg = get_config()",
    "    for k, v in cfg.items():",
    "        print(f'{k}: {v[:40]}...')"
)
[System.IO.File]::WriteAllLines("$PWD\config.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Test it:

python config.py

You should now see seven keys including COSMOS_DB_ENDPOINT pointing to https://cosmos-tggi2gmkw22w4.documents.azure.com:443/ and CONVERSATIONS_DATABASE_CONTAINER set to conversations.

Step 3 — Create cosmos.py

Create a dedicated Cosmos DB module:

			
$code = 'import os
cosmos = """from azure.cosmos import CosmosClient
from azure.identity import DefaultAzureCredential
import uuid
from datetime import datetime, timezone
def get_cosmos_client(endpoint):
    return CosmosClient(url=endpoint, credential=DefaultAzureCredential())
def save_conversation(endpoint, database_name, container_name, principal_id, question, tool_calls, answer, model, prompt_tokens, completion_tokens, total_tokens, apim_gateway):
    container = get_cosmos_client(endpoint).get_database_client(database_name).get_container_client(container_name)
    now = datetime.now(timezone.utc)
    fmt = \"%Y%m%d-%H%M%S\"
    doc_id = f\"run-{now.strftime(fmt)}-{str(uuid.uuid4())[:8]}\"
    document = {\"id\": doc_id, \"principal_id\": principal_id, \"timestamp\": now.isoformat(), \"question\": question, \"tool_calls\": tool_calls, \"answer\": answer, \"model\": model, \"prompt_tokens\": prompt_tokens, \"completion_tokens\": completion_tokens, \"total_tokens\": total_tokens, \"apim_gateway\": apim_gateway}
    container.create_item(body=document)
    print(f\"Saved conversation: {doc_id}\")
    return document
def get_conversation_history(endpoint, database_name, container_name, principal_id, limit=5):
    container = get_cosmos_client(endpoint).get_database_client(database_name).get_container_client(container_name)
    query = f\"SELECT TOP {limit} c.id, c.timestamp, c.question, c.answer, c.total_tokens FROM c WHERE c.principal_id = @principal_id ORDER BY c._ts DESC\"
    return list(container.query_items(query=query, parameters=[{\"name\": \"@principal_id\", \"value\": principal_id}], partition_key=principal_id))
"""
agent = """import json
from openai import AzureOpenAI
from config import get_config
from tools import get_weather, WEATHER_TOOL_DEFINITION
from cosmos import save_conversation, get_conversation_history
PRINCIPAL_ID = \"steefjan@msn.com\"
def run_agent_with_memory(user_question):
    cfg = get_config()
    apim_base = cfg[\"APIM_GATEWAY_URL\"].rstrip(\"/\").replace(\"/openai\", \"\")
    client = AzureOpenAI(azure_endpoint=apim_base, api_key=cfg[\"APIM_SUBSCRIPTION_KEY\"], api_version=\"2024-02-01\")
    messages = [{\"role\": \"user\", \"content\": user_question}]
    print(f\"Sending request via APIM: {apim_base}\")
    response = client.chat.completions.create(model=cfg[\"CHAT_DEPLOYMENT_NAME\"], messages=messages, tools=[WEATHER_TOOL_DEFINITION], tool_choice=\"auto\")
    msg = response.choices[0].message
    messages.append(msg)
    tool_calls_log = []
    answer = msg.content or \"\"
    total_prompt = response.usage.prompt_tokens
    total_completion = response.usage.completion_tokens
    if msg.tool_calls:
        for tool_call in msg.tool_calls:
            args = json.loads(tool_call.function.arguments)
            print(f\"  -> Tool call: get_weather({args})\")
            result_str = get_weather(**args)
            result_json = json.loads(result_str)
            print(f\"  -> Tool result: {result_str}\")
            tool_calls_log.append({\"name\": tool_call.function.name, \"arguments\": args, \"result\": result_json})
            messages.append({\"role\": \"tool\", \"tool_call_id\": tool_call.id, \"content\": result_str})
        response2 = client.chat.completions.create(model=cfg[\"CHAT_DEPLOYMENT_NAME\"], messages=messages)
        answer = response2.choices[0].message.content
        total_prompt += response2.usage.prompt_tokens
        total_completion += response2.usage.completion_tokens
    save_conversation(endpoint=cfg[\"COSMOS_DB_ENDPOINT\"], database_name=cfg[\"DATABASE_NAME\"], container_name=cfg[\"CONVERSATIONS_DATABASE_CONTAINER\"], principal_id=PRINCIPAL_ID, question=user_question, tool_calls=tool_calls_log, answer=answer, model=cfg[\"CHAT_DEPLOYMENT_NAME\"], prompt_tokens=total_prompt, completion_tokens=total_completion, total_tokens=total_prompt+total_completion, apim_gateway=apim_base.replace(\"https://\", \"\"))
    return answer
if __name__ == \"__main__\":
    cfg = get_config()
    print(\"=== Recent conversation history ===\")
    history = get_conversation_history(cfg[\"COSMOS_DB_ENDPOINT\"], cfg[\"DATABASE_NAME\"], cfg[\"CONVERSATIONS_DATABASE_CONTAINER\"], PRINCIPAL_ID, 3)
    if history:
        for h in history:
            print(f\"  [{h[\"timestamp\"]}] Q: {h[\"question\"][:60]}...\")
    else:
        print(\"  No previous conversations found.\")
    print()
    question = \"What is the weather like in Amsterdam right now?\"
    print(f\"Question: {question}\")
    answer = run_agent_with_memory(question)
    print(f\"Answer: {answer}\")
"""
with open("cosmos.py", "w", encoding="utf-8") as f:
    f.write(cosmos)
with open("agent_with_memory.py", "w", encoding="utf-8") as f:
    f.write(agent)
print("Done")
'
[System.IO.File]::WriteAllText("$PWD\write_files.py", $code, [System.Text.UTF8Encoding]::new($false))
python write_files.py

		

Pitfall: Managed Identity RBAC for Cosmos DB

The CosmosClient with DefaultAzureCredential uses your Azure CLI identity locally. That identity needs the Cosmos DB Built-in Data Contributor role on the Cosmos DB account — not a standard Azure RBAC role, but a Cosmos DB data plane role. The spoke deployment should have assigned this automatically via the assignCosmosDBCosmosDbBuiltInDataContributorExecutor deployment. If you get a 403, verify:

			
az cosmosdb sql role assignment list `
  --account-name cosmos-tggi2gmkw22w4 `
  --resource-group rg-ai-spoke-dev `
  --output table

Your principal ID (8e856fa1-f4c4-4a02-91a5-a6ccc6afc6b3) should appear with role definition ID ending in 00000000-0000-0000-0000-000000000002 (Built-in Data Contributor). If not, assign it:

			
az cosmosdb sql role assignment create `
  --account-name cosmos-tggi2gmkw22w4 `
  --resource-group rg-ai-spoke-dev `
  --role-definition-id /subscriptions/dc0f4d72-3734-4b03-8884-ccfb9c2c4cc7/resourceGroups/rg-ai-spoke-dev/providers/Microsoft.DocumentDB/databaseAccounts/cosmos-tggi2gmkw22w4/sqlRoleDefinitions/00000000-0000-0000-0000-000000000002 `
  --principal-id 8e856fa1-f4c4-4a02-91a5-a6ccc6afc6b3 `
  --scope /subscriptions/dc0f4d72-3734-4b03-8884-ccfb9c2c4cc7/resourceGroups/rg-ai-spoke-dev/providers/Microsoft.DocumentDB/databaseAccounts/cosmos-tggi2gmkw22w4

		

Pitfall: No Connection Strings

Never use Cosmos DB connection strings or account keys in the agent code. The pattern here uses DefaultAzureCredential throughout — locally it picks up your az login identity, in production it uses the spoke’s Managed Identity. This is the NEN 7510 and cVGZ security baseline compliant approach.

Step 4 — Create agent_with_memory.py

			
$lines = @(
    "import json",
    "from openai import AzureOpenAI",
    "from config import get_config",
    "from tools import get_weather, WEATHER_TOOL_DEFINITION",
    "from cosmos import save_conversation, get_conversation_history",
    "",
    "PRINCIPAL_ID = 'steefjan@msn.com'",
    "",
    "def run_agent_with_memory(user_question: str) -> str:",
    "    cfg = get_config()",
    "    apim_base = cfg['APIM_GATEWAY_URL'].rstrip('/').replace('/openai', '')",
    "",
    "    client = AzureOpenAI(",
    "        azure_endpoint=apim_base,",
    "        api_key=cfg['APIM_SUBSCRIPTION_KEY'],",
    "        api_version='2024-02-01',",
    "    )",
    "",
    "    messages = [{'role': 'user', 'content': user_question}]",
    "    print(f'Sending request via APIM: {apim_base}')",
    "",
    "    # First LLM call - tool decision",
    "    response = client.chat.completions.create(",
    "        model=cfg['CHAT_DEPLOYMENT_NAME'],",
    "        messages=messages,",
    "        tools=[WEATHER_TOOL_DEFINITION],",
    "        tool_choice='auto',",
    "    )",
    "",
    "    msg = response.choices[0].message",
    "    messages.append(msg)",
    "    first_usage = response.usage",
    "",
    "    tool_calls_log = []",
    "    answer = msg.content or ''",
    "    total_prompt_tokens = first_usage.prompt_tokens",
    "    total_completion_tokens = first_usage.completion_tokens",
    "",
    "    # Handle tool calls",
    "    if msg.tool_calls:",
    "        for tool_call in msg.tool_calls:",
    "            args = json.loads(tool_call.function.arguments)",
    "            print(f'  -> Tool call: get_weather({args})')",
    "            result_str = get_weather(**args)",
    "            result_json = json.loads(result_str)",
    "            print(f'  -> Tool result: {result_str}')",
    "",
    "            tool_calls_log.append({",
    "                'name': tool_call.function.name,",
    "                'arguments': args,",
    "                'result': result_json,",
    "            })",
    "",
    "            messages.append({",
    "                'role': 'tool',",
    "                'tool_call_id': tool_call.id,",
    "                'content': result_str,",
    "            })",
    "",
    "        # Second LLM call - synthesis",
    "        response2 = client.chat.completions.create(",
    "            model=cfg['CHAT_DEPLOYMENT_NAME'],",
    "            messages=messages,",
    "        )",
    "        answer = response2.choices[0].message.content",
    "        total_prompt_tokens += response2.usage.prompt_tokens",
    "        total_completion_tokens += response2.usage.completion_tokens",
    "",
    "    total_tokens = total_prompt_tokens + total_completion_tokens",
    "",
    "    # Save to Cosmos DB conversations container in the spoke",
    "    save_conversation(",
    "        endpoint=cfg['COSMOS_DB_ENDPOINT'],",
    "        database_name=cfg['DATABASE_NAME'],",
    "        container_name=cfg['CONVERSATIONS_DATABASE_CONTAINER'],",
    "        principal_id=PRINCIPAL_ID,",
    "        question=user_question,",
    "        tool_calls=tool_calls_log,",
    "        answer=answer,",
    "        model=cfg['CHAT_DEPLOYMENT_NAME'],",
    "        prompt_tokens=total_prompt_tokens,",
    "        completion_tokens=total_completion_tokens,",
    "        total_tokens=total_tokens,",
    "        apim_gateway=apim_base.replace('https://', ''),",
    "    )",
    "",
    "    return answer",
    "",
    "if __name__ == '__main__':",
    "    # Show last 3 conversations before running",
    "    from config import get_config",
    "    cfg = get_config()",
    "    print('=== Recent conversation history ===')",
    "    history = get_conversation_history(",
    "        endpoint=cfg['COSMOS_DB_ENDPOINT'],",
    "        database_name=cfg['DATABASE_NAME'],",
    "        container_name=cfg['CONVERSATIONS_DATABASE_CONTAINER'],",
    "        principal_id=PRINCIPAL_ID,",
    "        limit=3,",
    "    )",
    "    if history:",
    "        for h in history:",
    "            print(f'  [{h[chr(116)+chr(105)+chr(109)+chr(101)+chr(115)+chr(116)+chr(97)+chr(109)+chr(112)]}] Q: {h[chr(113)+chr(117)+chr(101)+chr(115)+chr(116)+chr(105)+chr(111)+chr(110))[:60]}...')",
    "    else:",
    "        print('  No previous conversations found.')",
    "    print()",
    "",
    "    question = 'What is the weather like in Amsterdam right now?'",
    "    print(f'Question: {question}')",
    "    answer = run_agent_with_memory(question)",
    "    print(f'Answer: {answer}')"
)
[System.IO.File]::WriteAllLines("$PWD\agent_with_memory.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Run it:

python agent_with_memory.py

A successful run looks like this:

Run it a second time and the history section will show the previous exchange:

Step 5 — Validate in Cosmos DB Data Explorer

Go to the Azure Portal → cosmos-tggi2gmkw22w4 → Data Explorer → cosmos-dbtggi2gmkw22w4 → conversations → Items.

You should see your conversation document with all fields populated. The partition key /principal_id should match steefjan@msn.com.

To query all conversations for a user:

SELECT * FROM c WHERE c.principal_id = 'steefjan@msn.com' ORDER BY c._ts DESC

To get a summary of all runs with token totals:

			
SELECT c.id, c.timestamp, c.question, c.total_tokens, c.model
FROM c
WHERE c.principal_id = 'steefjan@msn.com'
ORDER BY c._ts DESC

Step 6 — What’s in the Document

Looking at a stored conversation document, every field serves a purpose:

Field	Purpose
`id`	Unique run identifier — traceable back to a specific agent invocation
`principal_id`	Partition key — enables per-user history queries and RBAC scoping
`timestamp`	ISO 8601 UTC — audit trail, correlatable with APIM logs
`question`	Original user input — searchable for pattern analysis
`tool_calls`	Full tool call log including arguments and results — debugging and audit
`answer`	Final agent response — quality review and feedback loops
`model`	Model version — tracks which model version answered which questions
`prompt_tokens` / `completion_tokens`	Cumulative across both LLM calls — accurate per-conversation cost
`total_tokens`	Sum of both calls — FinOps input per user per conversation
`apim_gateway`	Gateway used — identifies which hub instance served the request

The token counts here are cumulative across both LLM calls (tool decision and synthesis), yielding a true per-conversation costrather than a per-call figure. This is more useful for FinOps reporting you care about the cost of answering a question, not the cost of individual API calls within that answer.

Pitfalls Summary

Pitfall	Fix
Cosmos DB firewall blocks local IP	Portal → Networking → All networks for dev, or add specific IP
403 on Cosmos DB write	Assign `Cosmos DB Built-in Data Contributor` data plane role to your principal
`CosmosResourceNotFoundError`	Verify database name (`cosmos-dbtggi2gmkw22w4`) and container name (`conversations`) match exactly
Partition key mismatch	Container was created with `/principal_id` — every document must include this field
`DefaultAzureCredential` fails locally	Run `az login` and ensure the correct subscription is selected
Never use connection strings	Use `DefaultAzureCredential` throughout — locally via `az login`, in production via Managed Identity

What the Full Citadel Data Layer Now Looks Like

After this post, the spoke’s data tier is fully active:

Store	What it holds	Who writes it
Hub Cosmos DB `ai-usage-container`	Per-LLM-call usage events (tokens, model, gateway, IP)	APIM gateway automatically
Spoke Cosmos DB `conversations`	Per-run conversation documents (question, tools, answer, cumulative tokens)	Agent code explicitly
App Config `appcs-tggi2gmkw22w4`	All configuration keys for the spoke	Spoke deployment automatically

The hub’s ai-usage-container captures the infrastructure view of every API call, governed and logged. The spoke’s conversations container captures the application view of every user interaction, structured and queryable. Together, they give you both compliance evidence and application telemetry from a single agent run.

What’s Next

The next post in this series showcases the Citadel Kill Switch and explains how it stops a governed agent when necessary. It details how the five-layer containment system in APIM effectively shuts down the process without affecting the spoke or agent code. The conversation history you’ve created illustrates the clear before-and-after contrast: requests flow to Cosmos DB and then abruptly halt at the gateway layer.

Azure PaaS Integration for Architects: A Practitioner’s Map

Posted on July 2, 2026 by steefjan1970

If you’ve spent any time in the Azure portal lately, you’ll know the problem isn’t a lack of PaaS services; it’s too many of them, with overlapping capabilities and just enough marketing gloss to make every option look like the right one. App Service or Container Apps? Logic Apps or Functions? Service Bus or Event Grid? The Azure PaaS catalog has grown quickly, and for integration architects specifically, the decisions compound: pick the wrong option at the compute layer, and you’re fighting the platform every time you add a connector, a retry policy, or a compliance control.

This post is a map, not a comparison matrix. I’m grouping the Azure PaaS services that matter to integration work into four layers: compute, integration, data, and governance, and walking through the decision points that arise when designing for a regulated enterprise environment rather than a greenfield demo. If you’ve followed my Logic Apps Agent Loop series or the APIM for AI workloads series, this sits underneath both the platform primer and the deeper posts, which assume you already have read.

Why Azure PaaS still matters to integration architects

IaaS provides you with a VM and asks you to manage everything above it. SaaS gives you the finished product and asks for nothing. PaaS sits in between: the platform owns patching, scaling, and availability, and you own the application logic and configuration. For integration workloads specifically, that trade-off is usually the right one: you rarely need to control the OS of a message broker, but you do need fine-grained control over routing, transformation, and policy enforcement.

The practical test I use: if a service requires you to think about instance sizing, OS patch cycles, or cluster upgrades, it’s leaning IaaS regardless of what the marketing page calls it. If it requires you to think about triggers, bindings, connectors, and scaling rules, it’s PaaS. AKS sits deliberately on that boundary; more on that below.

Diagram showing five stacked Azure PaaS layers for integration architects: Compute (App Service, Functions, Container Apps, AKS), Integration (Logic Apps, API Management, Service Bus, Event Grid), Data (Azure SQL Database, Cosmos DB, Cache for Redis), Governance and identity (Entra ID, Key Vault, Azure Policy, Monitor), and Governance and resilience — the agentic gap (per-action authorization, compensating actions, evaluation and drift detection). — The four core PaaS layers for integration work compute, integration, data, and governance/identity plus the fifth layer, agentic workloads, expose: per-action authorization, compensating actions, and evaluation.

Layer 1: Compute in Azure PaaS Integration

Azure App Service

Still, the default is web APIs and backend services that don’t need event-driven scaling. App Service gives you deployment slots, built-in autoscale, and managed TLS with minimal ceremony. For integration architects, the main use case is hosting synchronous REST APIs that front a backend system the kind of thing that used to be a WCF service or an on-prem IIS site.

The limitation that catches people out: App Service scales on CPU/memory/queue-length rules, not on arbitrary event volume. If your workload is bursty and event-driven rather than steadily loaded, you’ll either overprovision or look elsewhere.

Azure Functions

The event-driven counterpart. Functions are the right choice when the unit of work is a discrete event: a message landing in a queue, a file arriving in Blob Storage, or an HTTP call that needs to fan out. The Consumption plan gives true scale-to-zero, which matters for cost in low-traffic integration scenarios; the Premium and Flex Consumption plans trade some of that elasticity for warm instances and VNet integration, which most enterprise integration platforms need anyway because you’re rarely allowed to expose a public endpoint without a private link in front of it.

Where Functions get uncomfortable: long-running orchestrations. A single function execution has a timeout, and while Durable Functions solves the orchestration problem, you’re now managing a stateful workflow engine on top of a stateless compute primitive. That’s usually the point where I ask whether Logic Apps would do the job with less code.

Azure Container Apps

The newer entrant is increasingly my default recommendation for anything that needs to run a container without the operational overhead of Kubernetes. Container Apps gives you KEDA-based event-driven scaling, Dapr integration for service-to-service calls and pub/sub, and revision-based traffic splitting all without you touching a node pool. For integration architects building agent-based or microservice-style integration components, this is often the sweet spot: you get container portability (useful if the workload might move, or if you’re standardizing on containers for other reasons) without inheriting cluster lifecycle management.

Azure Kubernetes Service (AKS)

Worth naming even though it’s not strictly Azure PaaS: Microsoft manages the control plane, yet you still own node pool upgrades, networking configuration, and workload scheduling. AKS earns its place when you have genuine Kubernetes-native requirements: custom operators, a multi-team platform where Kubernetes is the common substrate, or workloads that need capabilities Container Apps doesn’t expose yet. For most integration teams, reaching for AKS by default is over-engineering. Reach for it when a specific requirement forces your hand, not because it’s the more “serious” option.

Layer 2: Integration with Azure PaaS

Azure Logic Apps

The workflow orchestration layer with Azure PaaS remains the most direct route to enterprise connectors: SAP, IBM MQ, mainframe hosts, and the long tail of line-of-business systems that lack a modern REST API. Standard Logic Apps (running on the single-tenant model) close most of the gaps that made Consumption Logic Apps hard to use in regulated environments: VNet integration, built-in state management, and per-workflow scaling.

The honest trade-off: Logic Apps designer-first workflows are fast to build and easy for less code-heavy teams to maintain, but they get harder to reason about and harder to code-review once a workflow grows past a certain complexity. I’ve found the practical ceiling is somewhere around “a dozen actions with a couple of branches.” Past that, either decompose into smaller workflows or move the logic into a Function.

Azure API Management

Not just a gateway for integration architects, APIM is where governance actually gets enforced. Rate limiting, authentication, request/response transformation, and policy-based routing all live here, in front of whatever compute layer is doing the real work. If you’re building any platform where multiple consumers hit a shared set of backend capabilities, APIM is the control point that lets you change backend implementations without breaking consumers and enforce policy without touching application code.

The thing worth planning for early: policy authoring in APIM is a distinct skill, separate from the languages your team already knows. Please budget time for the team to learn the policy XML dialect rather than treating it as an afterthought. Badly written policies are a common source of latency and hard-to-diagnose failures.

Azure Service Bus

The durable, ordered, transactional messaging backbone. Reach for Service Bus when you need guaranteed delivery, sessions for ordered processing, or transactional message handling across multiple operations. Topics and subscriptions give you pub/sub without standing up a separate broker.

Azure Event Grid

The lightweight, high-throughput event router. Where Service Bus is about reliable delivery of business messages, Event Grid is about routing high-volume, fire-and-forget events resource state changes, custom application events, IoT telemetry to whichever subscriber cares about them. The two are frequently used together: Event Grid fans out a notification, and a subscriber puts a durable message on Service Bus for guaranteed processing.

A rule of thumb I use with teams new to Azure integration: if losing a message would be a business incident, it belongs on Service Bus. If losing a message would just mean a missed notification, Event Grid is fine.

Layer 3: Data in Azure PaaS

Integration architecture lives and dies by what’s underneath it, and the PaaS data services matter as much as the compute and messaging layers.

Azure SQL Database remains the default for relational, transactional workloads with a need for strong consistency think reference data, transactional state, anything with real foreign-key relationships. In addition, Azure Cosmos DB earns its place when you need global distribution, flexible schema, or the kind of horizontal scale that a single SQL instance won’t give you cheaply; it’s also increasingly the default choice for conversation and state storage in agentic workloads, given its low-latency reads and flexible document model. Finally, Azure Cache for Redis sits in front of both, absorbing read load and giving you a fast, ephemeral store for session state or short-lived coordination data.

The mistake I see most often: teams default to Cosmos DB because it’s the “modern” choice, then discover they actually needed relational integrity and end up hand-rolling consistency checks that SQL would have given them for free. Pick based on the access pattern, not the reputation.

Layer 4: Governance and identity for Azure PaaS

This is the layer that separates a proof of concept from something you can run in a regulated industry. Microsoft Entra ID and managed identities remove the need for connection strings and API keys scattered across configuration files. Each of the Azure PaaS services above should authenticate via managed identity. Key Vault holds what can’t be a managed identity (third-party API keys, certificates). Azure Policy and Microsoft Defender for Cloud provide the guardrails and posture visibility that an auditor or security team will ask for. Azure Monitor and Application Insights are non-negotiable for integration platforms, especially when a message fails somewhere in a chain of five services. Distributed tracing is the difference between a five-minute diagnosis and a day of log archaeology.

Layer 5: The gap that agentic workloads expose

Everything above holds for conventional integration platforms. Agentic AI workloads add a wrinkle that a recent round of discussion on LinkedIn I saw around an enterprise agent architecture diagram put well: the model is arguably the least differentiated part of a production agent deployment. Identity, permissions, observability, governance, and reliable orchestration are what separate a working demo from something you can run against real systems, and a few of those deserve to be called out specifically for integration architects, because they don’t map cleanly onto the governance layer above.

Diagram showing a caller validated at a green dashed identity perimeter (Entra ID / RBAC) before entering an agent's reasoning loop of plan, call tool, observe result, and act. A coral dashed arrow shows a poisoned result from an untrusted tool or RAG source entering the loop directly, bypassing the perimeter. A per-action authorization control sits inside the loop, asking whether each specific call is allowed for the tenant. — The identity perimeter validates the caller once, at the edge. A poisoned tool result enters the reasoning loop from the data the agent requested and never crosses that boundary. Per-action authorization is the control that reaches inside the loop where the threat actually lives.

Idenitity

Identity secures who the agent is, not what it does. Entra ID and managed identity answer “Is this caller who it claims to be?” They don’t address what happens when a poisoned tool result or a manipulated retrieved document changes the agent’s next action mid-reasoning loop. Prompt injection rides in through the RAG layer and tool outputs inside the loop, where identity checks at the perimeter don’t reach. The practical implication for PaaS design is that authorization needs to occur per action, not just per identity. This is exactly what APIM policy scoping and per-tool consent in Logic Apps and AI Foundry connectors are for: to treat each tool call as its own authorization decision, rather than an inherited privilege from a validated caller.

Recovery

Recovery means compensating actions, not just retries. Agent actions have side effects across systems: a ticket got created, a record got updated, an email went out. A failed step three actions into an agent loop can’t just retry from the top; it needs a saga-style compensating action to undo what already happened. Service Bus sessions and Logic Apps’ native support for scoped try/catch-with-compensation are the building blocks here — but the compensation logic has to be designed in explicitly, because neither service provides it by default.

Evaluation

Evaluation and drift detection are a first-class layer, not an afterthought. Application Insights and Azure Monitor provide operational observability into latency, error rates, and throughput. They don’t tell you if the agent’s outputs are quietly degrading in quality over time. That’s a separate concern, and one worth budgeting for from the start rather than bolting on after the first bad production incident.

The questions worth asking before an agent goes anywhere near a real system: what did it access, what tool did it call, why did it act, what policy constrained it, what happened when it failed, and who owns the outcome. If a PaaS architecture can’t answer all six, the gap isn’t in the model; it’s in the platform around it.

Azure PaaS Integration decisions: a framework, not a decision tree for integration architects

None of these layers are picked in isolation; the compute choice constrains the integration pattern, and the integration pattern constrains what the data layer needs to support. When I’m working through this with a team, the questions I ask in order are:

Is the trigger an event or a schedule/request? Event-driven points toward Functions or Container Apps with KEDA; request-driven points toward App Service or APIM-fronted compute.
Does a human or a low-code team need to maintain this workflow? If yes, Logic Apps earns serious consideration even if a Function would be more “elegant.”
What’s the cost of losing a message? Business-critical → Service Bus. Best-effort notification → Event Grid.
Does the data need strong relational integrity, or flexible scale? SQL for the former, Cosmos DB for the latter — and don’t let the “modern” label make the decision for you.
Is everything behind managed identity and traced end to end? If the answer is no anywhere in the chain, that’s the next thing to fix, not the last.

Decision flowchart for choosing Azure PaaS services: question one splits event-driven compute (Functions, Container Apps) from request-driven compute (App Service, APIM-fronted); question two routes low-code-maintained workflows to Logic Apps; question three splits business-critical messaging (Service Bus) from best-effort events (Event Grid); question four chooses Azure SQL for relational integrity or Cosmos DB for scale; question five is a gate requiring managed identity and end-to-end tracing before deployment. — The five questions as a branching flow trigger type, workflow ownership, message-loss cost, data access pattern, and the managed-identity gate that comes before anything ships.

Azure PaaS Integration Conclusion

That’s the shape of it. In practice, most enterprise integration platforms end up using several of these services together: API Management fronting a mix of Logic Apps and Functions, backed by Service Bus for reliable delivery and Cosmos DB or SQL for state and the architecture work is less about picking a single winner than about drawing clean boundaries between them.

Microsoft Foundry Citadel Platform Azure: Connecting a Tool-Calling Agent

Posted on July 1, 2026 by steefjan1970

In the previous post we deployed a working Microsoft Foundry Citadel Platform on Azure Sweden Central, a Governance Hub built on Azure API Management and an Agent Spoke built on Azure AI Foundry. We validated the setup with a raw chat completion call through the APIM gateway. That proved the plumbing works. This post takes the next step: connecting a real tool-calling agent to the Microsoft Foundry Citadel Platform on Azure, using the Open-Meteo weather API as a tool, and showing that every LLM call flows through the hub’s governance layer.

The agent is built with the standard Azure OpenAI SDK pointed directly at the Citadel APIM gateway. It uses a custom function tool that calls the Open-Meteo API to retrieve real current weather data for any location. The governance hub intercepts all traffic: content safety policies fire, token usage is tracked, and telemetry flows into Application Insights. This is the Microsoft Foundry Citadel Platform doing what it is designed to do.

What We Build

The flow looks like this:

Flow diagram showing a Python agent using the OpenAI SDK making an initial request to the Citadel APIM gateway, which forwards to Azure OpenAI gpt-4o. The model requests a tool call to get_weather, the agent calls the Open-Meteo API for real weather data, submits the result back through APIM for a second LLM call, receives a grounded response, and telemetry flows to Application Insights and Cosmos DB. — End-to-end flow of a tool-calling agent on the Microsoft Foundry Citadel Platform in Azure Sweden Central, the Python agent routes both LLM calls through the APIM Governance Hub, executes the get_weather tool against Open-Meteo, and receives a grounded response, with all traffic captured in Application Insights and Cosmos DB.

Two LLM calls flow through APIM per agent run: the tool decision call and the synthesis call. Both are governed, appear in Application Insights, and contribute to Cosmos DB usage tracking.

Why Open-Meteo and Why the Standard OpenAI SDK

The original plan was to use the Azure AI Foundry Agent Service SDK with Bing Search grounding. Two blockers emerged:

Bing Search SKU eligibility: The Grounding with Bing Search resource (G1 SKU) requires Pay-As-You-Go or EA subscriptions and is not available on MVP or MSDN subscriptions.

AI Foundry Agent Service routing: The azure-ai-projects SDK routes LLM calls through the AI Foundry project’s internal endpoint (aif-tggi2gmkw22w4.openai.azure.com) rather than through APIM, bypassing the governance layer. In addition, even after adding APIM as a connected resource in the AI Foundry portal, the Agent Service does not honor it for model routing in the current preview version.

The solution, therefore, is to use the standard OpenAI Python SDK pointed directly at the APIM gateway endpoint. This guarantees that all traffic flows through the hub; consequently, the tool-calling loop is implemented explicitly in Python, and the governance telemetry is fully captured in Application Insights.

Open-Meteo is a free, open-source weather API; therefore, it requires no API key and returns structured JSON weather data. Additionally, it serves as a clean stand-in for any external API your agents might call in production.

Prerequisites

From the previous post you should have:

Hub deployed in rg-ai-hub-gateway-dev with APIM gateway URL https://apim-wpvlimv4ngkns.azure-api.net and subscription key
Spoke deployed in rg-ai-spoke-dev with App Config appcs-tggi2gmkw22w4 containing APIM_GATEWAY_URL and APIM_SUBSCRIPTION_KEY
Your principal ID with App Configuration Data Reader role on the spoke App Config

For this post you additionally need Python 3.11 or later installed locally.

Step 1 — Set Up the Python Environment

			
mkdir citadel-agent && cd citadel-agent
python -m venv .venv
# Windows
.venv\Scripts\activate
pip install openai
pip install azure-appconfiguration
pip install azure-identity
pip install requests

		

Step 2 — Read Configuration from App Config

Create config.py using Set-Content to avoid BOM issues on Windows:

			
$lines = @(
    "from azure.appconfiguration import AzureAppConfigurationClient",
    "from azure.identity import DefaultAzureCredential",
    "",
    "APP_CONFIG_ENDPOINT = 'https://appcs-tggi2gmkw22w4.azconfig.io'",
    "LABEL = 'ai-lz'",
    "",
    "def get_config() -> dict:",
    "    credential = DefaultAzureCredential()",
    "    client = AzureAppConfigurationClient(",
    "        base_url=APP_CONFIG_ENDPOINT,",
    "        credential=credential",
    "    )",
    "    keys = [",
    "        'AI_FOUNDRY_PROJECT_ENDPOINT',",
    "        'CHAT_DEPLOYMENT_NAME',",
    "        'APIM_GATEWAY_URL',",
    "        'APIM_SUBSCRIPTION_KEY',",
    "    ]",
    "    config = {}",
    "    for key in keys:",
    "        setting = client.get_configuration_setting(key=key, label=LABEL)",
    "        config[key] = setting.value",
    "    return config",
    "",
    "if __name__ == '__main__':",
    "    cfg = get_config()",
    "    for k, v in cfg.items():",
    "        print(f'{k}: {v[:30]}...')"
)
[System.IO.File]::WriteAllLines("$PWD\config.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Test it:

python config.py

All four keys should return truncated values. If you get a 403, wait 2–5 minutes for role assignment propagation and retry.

Pitfall: Always Use WriteAllLines for Python Files on Windows

Out-File -Encoding utf8NoBOM and @"..."@ | Out-File both add a BOM on some Windows PowerShell versions, causing Python to throw SyntaxError: Non-UTF-8 code starting with '\xff'. Use [System.IO.File]::WriteAllLines with [System.Text.UTF8Encoding]::new($false) to write files without BOM.

Step 3 — Define the Weather Tool

Create tools.py:

			
$lines = @(
    "import json",
    "import requests",
    "",
    "def get_weather(location: str) -> str:",
    "    try:",
    "        geo = requests.get(",
    "            'https://geocoding-api.open-meteo.com/v1/search',",
    "            params={'name': location, 'count': 1, 'language': 'en', 'format': 'json'},",
    "            timeout=10",
    "        )",
    "        geo.raise_for_status()",
    "        geo_data = geo.json()",
    "        if not geo_data.get('results'):",
    "            return json.dumps({'error': f'Location not found: {location}'})",
    "        r = geo_data['results'][0]",
    "        weather = requests.get(",
    "            'https://api.open-meteo.com/v1/forecast',",
    "            params={'latitude': r['latitude'], 'longitude': r['longitude'], 'current_weather': True, 'wind_speed_unit': 'kmh', 'timezone': 'auto'},",
    "            timeout=10",
    "        )",
    "        weather.raise_for_status()",
    "        c = weather.json()['current_weather']",
    "        codes = {0:'Clear sky',1:'Mainly clear',2:'Partly cloudy',3:'Overcast',45:'Foggy',61:'Slight rain',63:'Moderate rain',65:'Heavy rain',71:'Slight snow',80:'Showers',95:'Thunderstorm'}",
    "        return json.dumps({'location': f'{r[chr(110)+(chr(97)+chr(109)+chr(101))]}, {r.get(chr(99)+chr(111)+chr(117)+chr(110)+chr(116)+chr(114)+chr(121),chr(32))}', 'temperature_celsius': c['temperature'], 'wind_speed_kmh': c['windspeed'], 'wind_direction_degrees': c['winddirection'], 'condition': codes.get(c['weathercode'],'Unknown'), 'is_day': bool(c['is_day'])})",
    "    except Exception as e:",
    "        return json.dumps({'error': str(e)})",
    "",
    "WEATHER_TOOL_DEFINITION = {",
    "    'type': 'function',",
    "    'function': {",
    "        'name': 'get_weather',",
    "        'description': 'Get current weather for a location. Returns temperature in Celsius, wind speed, condition.',",
    "        'parameters': {",
    "            'type': 'object',",
    "            'properties': {'location': {'type': 'string', 'description': 'City name e.g. Stockholm'}},",
    "            'required': ['location']",
    "        }",
    "    }",
    "}"
)
[System.IO.File]::WriteAllLines("$PWD\tools.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Test it:

python -c "from tools import get_weather; print(get_weather('Stockholm'))"

Step 4 — Create the Agent

Create agent.py using the standard openai SDK pointed directly at the APIM gateway:

			
$lines = @(
    "import json",
    "from openai import AzureOpenAI",
    "from config import get_config",
    "from tools import get_weather, WEATHER_TOOL_DEFINITION",
    "",
    "def run_agent(user_question: str) -> str:",
    "    cfg = get_config()",
    "",
    "    # Strip /openai suffix - AzureOpenAI SDK adds it automatically",
    "    apim_base = cfg['APIM_GATEWAY_URL'].rstrip('/').replace('/openai', '')",
    "",
    "    client = AzureOpenAI(",
    "        azure_endpoint=apim_base,",
    "        api_key=cfg['APIM_SUBSCRIPTION_KEY'],",
    "        api_version='2024-02-01',",
    "    )",
    "",
    "    messages = [{'role': 'user', 'content': user_question}]",
    "    print(f'Sending request via APIM: {apim_base}')",
    "",
    "    # First LLM call - agent decides whether to use the tool",
    "    response = client.chat.completions.create(",
    "        model=cfg['CHAT_DEPLOYMENT_NAME'],",
    "        messages=messages,",
    "        tools=[WEATHER_TOOL_DEFINITION],",
    "        tool_choice='auto',",
    "    )",
    "",
    "    msg = response.choices[0].message",
    "    messages.append(msg)",
    "",
    "    # Handle tool calls if the agent decided to use get_weather",
    "    if msg.tool_calls:",
    "        for tool_call in msg.tool_calls:",
    "            args = json.loads(tool_call.function.arguments)",
    "            print(f'  -> Tool call: get_weather({args})')",
    "            result = get_weather(**args)",
    "            print(f'  -> Tool result: {result}')",
    "            messages.append({",
    "                'role': 'tool',",
    "                'tool_call_id': tool_call.id,",
    "                'content': result,",
    "            })",
    "",
    "        # Second LLM call - synthesise grounded response",
    "        response = client.chat.completions.create(",
    "            model=cfg['CHAT_DEPLOYMENT_NAME'],",
    "            messages=messages,",
    "        )",
    "        return response.choices[0].message.content",
    "",
    "    return msg.content",
    "",
    "if __name__ == '__main__':",
    "    question = 'What is the weather like in Stockholm right now?'",
    "    print(f'Question: {question}')",
    "    answer = run_agent(question)",
    "    print(f'Answer: {answer}')"
)
[System.IO.File]::WriteAllLines("$PWD\agent.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Run it:

python agent.py

A successful run looks like this:

Pitfall: APIM Endpoint Format

The AzureOpenAI SDK constructs the full path as {azure_endpoint}/openai/deployments/{model}/chat/completions. If your APIM_GATEWAY_URL in App Config contains /openai at the end, strip it before passing to the client; otherwise, the SDK builds a doubled path (/openai/openai/...) that returns a 500 from APIM. The line apim_base = cfg['APIM_GATEWAY_URL'].rstrip('/').replace('/openai', '') handles this automatically.

After running the agent, check Application Insights in the hub:

			
az monitor app-insights query `
  --app <Your APIM instance Name> `
  --resource-group rg-ai-hub-gateway-dev `
  --analytics-query "requests | where timestamp > ago(10m) | project timestamp, name, resultCode, duration | order by timestamp desc" `
  --output table

		

Pitfall: CLI vs Portal Ingestion Lag

The CLI query hits the Log Analytics store; however, it has a 5–10-minute ingestion lag. In contrast, the Azure Portal Application Insights blade uses a live metrics path and shows results immediately. Therefore, if the CLI returns an empty response, it’s a good idea to check the portal directly, go to the APIM instance → Performance to view requests in real time.

What Governed Traffic Looks Like in the Portal

The Application Insights Performance blade shows two operation types per agent run:

azure-openai-service-api:rev=1 - ChatCompletions_Create — the APIM policy-matched operation, showing the governed calls with content safety applied
POST /openai/openai/deployments/chat/chat/completions — the raw endpoint calls

Each agent run generates two successful requests (tool decision + synthesis), both with response code 200 and latency around 900ms–1.2s for gpt-4o. Failed attempts from earlier endpoint format issues show as 500s and are clearly distinguishable.

Azure Application Insights Performance blade showing 9 requests to the Citadel APIM gateway including the azure-openai-service-api ChatCompletions_Create operation with 6 calls at 1.09 seconds average and POST /openai/deployments/chat/chat/completions with 3 calls, confirming the tool-calling agent traffic flows through the Microsoft Foundry Citadel governance hub in Sweden Central. — Application Insights Performance blade for the Citadel Governance Hub, confirming agent traffic routed through APIM: 9 requests captured, with the governed ChatCompletions_Create operation averaging 1.09 seconds, and all successful calls returning response code 200.

The Azure AI Foundry Agent Service SDK — What We Learned

For completeness, here is a summary of what we discovered when attempting to use the azure-ai-projects SDK before switching to the standard OpenAI SDK:

Issue	Detail
`FunctionTool` import path	Must import from `azure.ai.agents.models`, not `azure.ai.projects.models`
`create_thread` does not exist	Use `create_thread_and_process_run` instead
`list_messages` does not exist	Use `client.agents.messages.list(thread_id=...)`
`MessageRole.ASSISTANT` does not exist	Use the string `"assistant"` directly
`enable_auto_function_calls(toolset=...)` fails	Parameter is `tools=`, not `toolset=`
Function not found error	Call `client.agents.enable_auto_function_calls(tools=toolset)` before `create_agent`
Agent traffic bypasses APIM	AI Foundry Agent Service uses its own endpoint resolution — use standard OpenAI SDK pointed at APIM instead

The Agent Service SDK is in active beta development (azure-ai-agents==1.2.0b6 at the time of writing). Expect these APIs to stabilise and the APIM routing issue to be addressed in future versions.

Pitfalls Summary

Pitfall	Fix
Grounding with Bing Search G1 SKU not eligible	Requires Pay-As-You-Go or EA subscription
`Bing.Search.v7` CLI creation fails	Resource type moved to `Microsoft.Bing/accounts`
BOM in Python files on Windows	Use `[System.IO.File]::WriteAllLines` with `UTF8Encoding($false)`
APIM endpoint doubles `/openai` path	Strip `/openai` from URL before passing to `AzureOpenAI` client
App Config 403 on first run	Wait 2–5 minutes for role assignment propagation
CLI Application Insights query empty	5–10 minute ingestion lag — check portal Performance blade instead
AI Foundry Agent Service bypasses APIM	Use standard `openai` SDK pointed directly at APIM gateway

What the Full Citadel Loop Delivers

With the agent running through APIM, every LLM call in the tool-calling loop is governed:

Content Safety — both the user question and the synthesised response pass through Azure AI Content Safety policies configured in APIM.

Token tracking — each of the two LLM calls contributes to the token usage log in Cosmos DB, giving you per-call cost attribution by APIM subscription key. The Cosmos DB ai-usage-container in the hub captures a structured document for each LLM call, including the model version, token counts, gateway region, request IP, APIM subscription name, backend routing, and timestamp. In production, the productName field maps to the APIM subscription key. Aggregating documents by this field gives you direct FinOps reporting per AI initiative.

Azure Cosmos DB Data Explorer showing a usage event document in the ai-usage-container of the Citadel hub, with fields including model gpt-4o-2024-11-20, promptTokens 17, responseTokens 53, totalTokens 70, gatewayRegion Sweden Central, productName Portal-Admin, and timestamp 6/24/2026, confirming token tracking and cost attribution via the Microsoft Foundry Citadel APIM governance hub. — The Citadel hub Cosmos DB ai-usage-container showing a usage document captured from the tool-calling agent run model gpt-4o-2024-11-20, 70 total tokens, gateway region Sweden Central, routed via apim-wpvlimv4ngkns. Every LLM call through APIM generates a document like this, which serves as the cost attribution and audit trail for enterprise AI governance.

Latency observability — Application Insights captures the duration of every call, making it easy to identify slow tool calls or model latency spikes.

Audit trail — every request is logged with timestamp, operation name, response code, and duration. For a healthcare or financial services context, this is your compliance evidence.

What’s Next

This post wires a tool-calling agent to the Citadel hub using the standard OpenAI SDK. The natural next steps:

Azure AI Foundry Agent Service routing — as the SDK matures, the azure-ai-projects client will likely gain proper APIM gateway support. Watch the azure-ai-agents release notes for updates on connection-based routing.

Conversation persistence — store conversation history in the Cosmos DB conversations container already deployed in the spoke. The App Config key CONVERSATIONS_DATABASE_CONTAINER points to it.

Network isolation — re-enable networkIsolation=true in the spoke parameters to route all traffic through private endpoints.

Multiple tools — extend the agent with additional function tools (document lookup, product catalog, claims system) using the same pattern. Each tool call flows through APIM and is governed identically.

Conclusion

Connecting a real tool-calling agent to the Microsoft Foundry Citadel Platform on Azure requires three components: the standard OpenAI SDK configured to point to the APIM gateway, a function tool with a JSON schema definition, and an explicit tool-call-handling loop. Everything else, governance, content safety, token tracking, and cost attribution, is handled by the Citadel hub automatically.

The path to get here involved navigating several SDK beta rough edges and discovering that the AI Foundry Agent Service bypasses APIM in its current preview form. These are expected friction points with a platform in active development. The governance architecture underneath is sound, the APIM policies work, and the Application Insights telemetry confirms it.

Two LLM calls. Both governed. Both visible. That is what the Citadel hub delivers.

Azure Logic Apps Agent Loop Production Operations

Posted on June 29, 2026 by steefjan1970

Part 7 of 7 in the Logic Apps Agent Loop series

Part 6 covered the security stack for agentic workflows: Easy Auth, Managed Identity, and Key Vault. This final post closes the series with Azure Logic Apps agent loop production operations: how to monitor agent loops with Application Insights, what the pricing model looks like across Standard and Consumption, the key platform limits to be aware of, and how to deploy agentic workflows through a repeatable DevOps pipeline.

By the end of this post, you will have a complete picture of what it takes to run an agentic workflow in production, not just to build one.

Azure Logic Apps agent loop production monitoring with Application Insights

The run history you have used throughout this series is the starting point for understanding what an agent loop did and why. For production workloads you need more: aggregated metrics across multiple runs, structured log queries, alerting on failures, and tracing across distributed systems. Application Insights provides all of this for Standard logic apps.

Enabling Application Insights

If you did not enable Application Insights when you created la-agent-loop, you can add it after deployment:

In the Azure portal, open your la-agent-loop logic app resource
Navigate to Application Insights under Settings in the left sidebar
Click Turn on Application Insights
After the pane updates, click Apply → Yes
Click View Application Insights data to open the dashboard

Application Insights begins collecting telemetry from that point forward; it does not backfill historical run data.

What Application Insights captures for agent loops

For Standard agentic workflows, Application Insights captures enhanced telemetry beyond what the run history provides. Key data points include:

Requests — each workflow trigger appears as an incoming request, with duration, success/failure status, and HTTP response code.

Dependencies — each tool call the agent makes appears as a dependency call, with the target service, duration, and result. Moreover, for an agent loop that invokes Azure OpenAI and Azure AI Search, you will see both as dependency entries, making it straightforward to identify which tool call is slowest.

Exceptions — any workflow failure surfaces as an exception with a full stack trace, correlated to the specific run and iteration where it occurred.

Custom metrics — Logic Apps emits custom metrics for agent loop iterations, token usage, and tool invocation counts. These are queryable via Kusto (KQL) in the Logs blade.

Useful KQL queries for agent loops

You can query agent loop run durations for, let’s say, over the last 72 hours:

requests | where timestamp > ago(72h) | where name contains "agent" | summarize avg(duration), max(duration), count() by bin(timestamp, 1h) | render timechart

To identify failed agent loop runs:

requests | where timestamp > ago(72) | where success == false | project timestamp, name, duration, resultCode, cloud_RoleInstance | order by timestamp desc

To track tool call durations:

dependencies | where timestamp > ago(24h) | where type == "HTTP" | summarize avg(duration), count() by target | order by avg_duration desc

Reading the run history for agent loops

The run history in the Logic Apps portal is the fastest way to debug a specific agent loop run. For agentic workflows it shows more than a conventional run history — each agent action expands to show its iterations, and each iteration shows the model’s reasoning, the tool calls it made, and the results it received.

The Agent activity tab is the most useful view for agentic workflows. It shows the conversation between the model and the tools in chronological order, every message the model generated, every tool it invoked, and every result it received. The agent loop reveals its chain of thought.

Key things to look for in the run history:

Iteration count — how many Think → Act → Observe cycles the loop ran. A loop that runs the maximum number of iterations (default 100) without completing is a signal that the instructions are ambiguous or the tools are not returning usable results.
Tool call inputs and outputs — expand each tool call to see exactly what the model passed as parameters and what the tool returned. This is the fastest way to diagnose a tool that is returning unexpected data.
Token usage — the metadata output of each agent action shows total tokens, prompt tokens, and completion tokens. High prompt token counts indicate the conversation history is growing large — consider enabling agent history reduction.

Azure Logic Apps agent loop production pricing: Standard versus Consumption

The pricing model for agentic workflows differs between Standard and Consumption, and it differs significantly from conventional Logic Apps pricing.

Standard

Standard logic apps use a fixed App Service Plan pricing model — you pay for the compute capacity whether the workflow is running or not. Agentic workflows on Standard do not incur extra charges beyond the base App Service Plan cost. However, every Azure OpenAI call the agent makes is billed separately against your Azure OpenAI resource at standard token rates.

For the la-agent-loop workflows in this series:

The Standard logic app itself: App Service Plan (Workflow Standard WS1 or higher)
Each GPT-4o call: billed to aoai-demo-ptu at your PTU reservation rate
Azure AI Search queries (if used): billed separately at Search tier rates

The practical implication is that Standard agentic workflow costs scale with model usage, not with workflow execution count. A loop that runs five iterations and calls GPT-4o five times costs five times more in model tokens than a loop that resolves in one iteration.

Consumption

Consumption agentic workflows use a pay-as-you-go model. Agent loop pricing is based on the number of tokens each agent action uses and appears as Enterprise Units on your bill. This is a different billing unit from the standard Consumption action executions — each token consumed by the agent is metered separately.

The Consumption agent loop is also subject to throttling based on token usage — unlike Standard, which is constrained only by the App Service Plan compute capacity.

For production workloads with predictable, high-volume agent loop usage, Standard with a PTU Azure OpenAI deployment is the more cost-predictable option. For low-volume or experimental workloads, Consumption pay-as-you-go avoids the fixed App Service Plan cost.

Known limits for agentic workflows

Before going to production, be aware of the current platform limits:

Tool constraints — tools can only contain actions, not triggers. A tool must start with an action and always contains at least one action. Control flow actions (conditions, loops, switches) are not supported inside tools. A tool only works inside the agent loop where it is defined — it cannot be shared across agent actions.

Consumption-specific limits — Consumption agentic workflows can only be created in the Azure portal, not Visual Studio Code. The AI model can come from any region, so data residency for a specific region is not guaranteed for data the model handles. The agent action is throttled based on token usage.

Agent history — by default the agent loop accumulates the full conversation history across iterations. For long-running loops this can push the context length toward the model’s limit. Enable agent history reduction in the agent action’s Settings tab to manage this. The default strategy is token count reduction with a ceiling of 128,000 tokens — adjust this based on your model’s context window and your scenario’s complexity.

Deploying agentic workflows through a DevOps pipeline

Standard logic apps are built on the Azure Functions runtime and deploy the same way as any other Standard logic app — via zip deploy, Azure Pipelines, or GitHub Actions. The workflow definitions are JSON files on disk, making them version-controllable and deployable through standard CI/CD patterns.

What to include in source control

For an agentic workflow project, the key files to version-control are:

sequential-agents/workflow.json — the sequential agent loop definition
sample/workflow.json — the autonomous agent from Post 2
mcp-research/workflow.json — the MCP research workflow from Post 4
connections.json — connection references (without credentials — those go in Key Vault)
host.json — Logic Apps host configuration
local.settings.json — local development settings (excluded from source control, .gitignore)

Deploying with Azure CLI

The simplest production deployment from a CI/CD pipeline uses the Azure CLI:

# Zip the logic app project zip -r la-agent-loop.zip . -x "*.git*" "local.settings.json"

# Deploy to Azure az logicapp deployment source config-zip \ --name la-agent-loop \ --resource-group rg-ai-solutions \ --src la-agent-loop.zip

Environment-specific configuration

Agent connections and app settings differ between development and production environments. Use Azure CLI or Bicep to set environment-specific app settings as part of the deployment pipeline:

az logicapp config appsettings set \ --name la-agent-loop \ --resource-group rg-ai-solutions \ --settings \ agent_openAIEndpoint="https://aoai-prod.openai.azure.com/" \ OPENAI__endpoint="https://aoai-prod.openai.azure.com/"

This keeps environment-specific values out of source control and injected at deploy time — the standard twelve-factor app pattern applied to Logic Apps.

Closing the series

This post closes a seven-part series on Azure Logic Apps agent loop production operations, from first principles through to observability, pricing, and DevOps deployment. The series covered:

Why the agent loop is a different design paradigm from conventional workflow automation
The anatomy of a single agent loop — trigger, instructions, model, and tools
Autonomous versus conversational agentic workflows: when to use each
Building tools: connectors, custom connectors, and MCP servers
Multi-agent patterns: prompt chaining, routing, handoff, and orchestrator-workers
Securing agentic workflows: Easy Auth, Managed Identity, and Key Vault
Observability, pricing, and production operations — this post

The agent loop is still a rapidly evolving capability in Azure Logic Apps. The platform limitations documented throughout this series Foundry Models connection persistence, API Center MCP wizard regional constraints, Foundry OpenAPI tool network restrictions will be addressed in future platform releases. The architectural patterns, however, are stable: the four building blocks of an agent loop, the three tooling layers, the four multi-agent patterns, and the two-concern security model will remain the right mental model for this platform regardless of how the surface-level tooling evolves.

Azure Logic Apps Agentic Workflow Security in Production

Posted on June 26, 2026 by steefjan1970

Part 6 of 7 in the Logic Apps Agent Loop series

Part 5 covered multi-agent patterns in the Azure Logic Apps agentic workflow series. Each pattern extends your agent’s reach, but that reach comes with a security cost. The more capable and connected your agent, the more important it is to understand who can call it and under what conditions. This post covers the expanded caller surface, the developer key’s limitations, and the full production security stack.

Conventional Logic Apps workflows have a bounded caller surface. The callers are known systems: a scheduler, a service bus, and an HTTP client you control. The authentication model is straightforward: SAS tokens, Managed Identity, and IP filtering. Agentic workflows fundamentally change this, particularly conversational ones. When you expose a chat interface to external callers, those callers can be people, other agents, MCP servers, or automation clients from networks you do not control. The security model has to change with the threat model.

Two-column diagram showing the security model for Azure Logic Apps agentic workflows. Left column shows the caller surface: human users via external chat client, external agents with dynamic unknown callers, MCP servers on untrusted networks, automation clients for CI/CD, and a developer key marked as portal testing only and not for production. Arrows from all caller types point toward the right column. Right column shows the security stack from top to bottom: Entry via Easy Auth with Microsoft Entra ID and Conditional Access, Logic app Standard running agentic workflows and agent loops, Managed Identity for backend authentication to Azure OpenAI, AI Search, and Storage, Azure Key Vault for secrets that cannot use Managed Identity, and Consumption OAuth 2.0 with Entra ID agent auth policy at the bottom. Legend shows teal for auth layers, purple for workflow and caller, coral for avoid in production. — Figure 1 — The two security concerns for Azure Logic Apps agentic workflows. The caller surface (left) expands significantly compared to conventional workflows. Human users, external agents, MCP servers, and automation clients can all reach the workflow endpoint from networks you do not control. The developer key used during portal development is explicitly not suitable for any of these caller types. The security stack (right) addresses the expanded surface area in two directions: Easy Auth with Microsoft Entra ID secures who can invoke the workflow, while Managed Identity and Key Vault secure what the workflow can call, without storing credentials in app settings.

The expanded caller surface

The shift from nonagentic to agentic workflows introduces a qualitatively different caller population. In a nonagentic workflow the trigger is called by a known system at a known time for a known reason. In a conversational agentic workflow the trigger is called by:

Human users interacting through an external chat client
External agents invoking the workflow as a tool
MCP servers routing requests through the workflow
Automation clients from untrusted or unknown networks

Each of these caller types introduces different identity, trust, and access control requirements. A billing system calling a webhook is easy to reason about. An external agent calling your workflow from an unknown network at unpredictable intervals is not.

This expanded surface area is why Microsoft’s documentation draws a sharp distinction between the developer key used during design and testing in the Azure portal and proper production authentication. Understanding that distinction is the starting point for securing any agentic workflow.

The developer key: what it is and what it is not

Understanding the developer key’s limitations is the starting point for any serious Azure Logic Apps agentic workflow security implementation. When you test a conversational agentic workflow in the Logic Apps designer, the Azure portal authenticates your test calls using a developer key. The developer key is a convenience mechanism that lets you skip manual authentication setup during development. It fires automatically when you run a workflow, call a Request trigger, or interact with the integrated chat interface.

The developer key has five hard limitations that make it unsuitable for production:

It is not a substitute for Easy Auth, Managed Identity, federated credentials, or signed SAS callback URLs.
In addition, it is designed for large or untrusted caller populations, agent tools, or automation clients.
It is also not a per-user authorization mechanism; it has no granular scopes or roles.
And finally, it is not governed by Conditional Access policies at the request execution layer, only at the portal sign-in layer. And it is not intended for programmatic or CI/CD usage.

The developer key is linked to a specific user and tenant based on an Azure Resource Manager bearer token. Because of that binding, you cannot distribute it externally. It is, in the Microsoft documentation’s own framing, a mechanism for quick testing before you formalize authentication, not a path to production.

Azure Logic Apps agentic workflow security: Standard versus Consumption

The right production authentication mechanism depends on your Logic Apps hosting model.

Setting up Managed Identity for backend connections

Easy Auth secures who can call your agentic workflow. Managed Identity secures what your workflow can call. These are two distinct security concerns and both need to be addressed in production.

When your agent invokes a tool, Azure OpenAI, Azure AI Search, a storage account, or a Service Bus namespace, that call needs to be authenticated. The default approach during development is often to store an API key or connection string in app settings. In production, replace these with Managed Identity connections wherever possible. This removes credentials from app settings entirely. The logic app authenticates to backend services using its Azure AD identity, which is governed by RBAC, auditable, and revocable without rotating keys.

Go to your la-agent-loop resource → Identity → System assigned → turn Status to On
Save — Azure assigns a service principal to the logic app
In each target resource (Azure OpenAI, AI Search, Storage), go to Access control (IAM) → Add role assignment
Assign the appropriate role to the logic app’s Managed Identity:
- Azure OpenAI: Cognitive Services OpenAI User
- Azure AI Search: Search Index Data Reader
- Azure Blob Storage: Storage Blob Data Reader
In the Logic Apps connections, switch from API key authentication to Managed Identity for each backend service where possible.

Note: Managed Identity authentication for the agent model connection is only supported when the model type is AzureOpenAI. If your workflows use the MicrosoftFoundry model type, as in this series, the agent connection must use Key authentication. Managed Identity remains the right choice for all other backend connections such as Azure AI Search, Blob Storage, and Service Bus.

Azure portal Identity blade for the la-agent-loop Standard logic app. The System assigned tab is selected, Status is set to On, and the Object principal ID is shown as 8db32242-e936-4d84-a44a-6b39d37f24f7. An Azure role assignments button is visible under Permissions. — Figure 2 — System-assigned Managed Identity enabled on the `la-agent-loop` Standard logic app. Once enabled, Azure registers the logic app as a service principal in Microsoft Entra ID. Click **Azure role assignments** to assign the appropriate RBAC roles to each backend resource: Cognitive Services OpenAI User for Azure OpenAI and Search Index Data Reader for Azure AI Search, so the agent can authenticate to those services without storing any credentials in app settings.

`Setting up Easy Auth for your Azure Logic Apps agentic workflow`

For Standard logic apps, the production authentication path is Easy Auth, also known as App Service Authentication. Easy Auth is an App Service platform feature that sits in front of your logic app and enforces identity-based authentication on every incoming request before it reaches your workflow.

When you enable Easy Auth on a Standard logic app, external callers, whether human users, external agents, or MCP servers, must present a valid identity token. Easy Auth validates the token against Microsoft Entra ID before allowing the request through. This gives you full Conditional Access policy enforcement, per-user identity, token revocation, and audit logging, the full production security stack.

To set up Easy Auth on a Standard logic app:

In the Azure portal, open your la-agent-loop logic app resource
Navigate to Authentication in the left sidebar under Settings
Click Add identity provider
Select Microsoft as the identity provider
Under App registration, select an existing registration or choose Create new app registration and name it la-agent-loop-auth
Under Supported account types, select Current tenant — single tenant for internal workloads
Set Unauthenticated requests to HTTP 401 Unauthorized: recommended for APIs
Leave Token store enabled
Click Add

Note: Easy Auth operates at the App Service host level, before the Logic Apps runtime processes the request. Authentication failures are rejected at the infrastructure layer with a 401 the workflow never executes and no run history entry is created for unauthenticated calls.

Azure portal Authentication blade for the la-agent-loop Standard logic app. Authentication settings show App Service authentication as Enabled, Restrict access set to Require authentication, and Unauthenticated requests set to Return HTTP 401 Unauthorized. The Identity provider section shows Microsoft with app registration la-agent-loop-auth and client ID bc8d8407-a79e-4a18-be48-f0fc54fa4966. — Figure 3 — Easy Auth configured on the `la-agent-loop` Standard logic app. App Service authentication is enabled, unauthenticated requests return HTTP 401 Unauthorized, and Microsoft Entra ID is registered as the identity provider via the `la-agent-loop-auth` app registration. Any external caller, human user, external agent, or MCP servermust now present a valid Entra ID token before the Logic Apps runtime processes the request.

Consumption: OAuth 2.0 with Microsoft Entra ID

For Consumption logic apps, configure an agent authorization policy on the logic app resource using OAuth 2.0 with Microsoft Entra ID. This provides equivalent identity enforcement to Easy Auth for the Consumption hosting model. For the full configuration steps, see Create conversational agent workflows in Azure Logic Apps on Microsoft Learn.

Key Vault for secrets that cannot use Managed Identity

Not every connection in an Azure Logic Apps agentic workflow supports Managed Identity. Where API keys or connection strings are unavoidable, store them in Azure Key Vault and reference them from Logic Apps app settings using the Key Vault reference syntax:

			
@Microsoft.KeyVault(SecretUri=https://your-keyvault.vault.azure.net/secrets/your-secret/)

This keeps credentials out of app settings in plain text, provides centralized rotation, and gives you audit logs of every secret access. The Standard logic app accesses Key Vault using its Managed Identity; no separate credentials are needed for the vault itself.

Network controls for Standard workflows

Standard logic apps run on the App Service infrastructure, which gives you network-level controls that Consumption workflows do not have:

Private endpoints allow your logic app to receive inbound traffic only from within a virtual network, removing public internet exposure entirely. This is the recommended configuration for production agentic workflows that serve internal users or agents.

VNet integration allows your logic app to make outbound calls to services within a virtual network, including on-premises systems, private Azure services, and internal APIs, without exposing those services to the internet.

IP access restrictions let you restrict inbound traffic to specific IP ranges at the App Service level, providing a lighter-weight alternative to private endpoints for scenarios where full network isolation is not required.

For production agentic workflows processing sensitive data, patient records, financial data, internal business intelligence, and private endpoints with VNet integration is the right starting point.

Azure Logic Apps agentic workflow security checklist

Before going live with any agentic workflow:

Easy Auth configured with Microsoft Entra ID (Standard) or OAuth 2.0 agent authorisation policy (Consumption)
Developer key not used or referenced in any production caller
Managed Identity enabled on the logic app and assigned to all backend services
API keys and connection strings moved to Key Vault references
Private endpoints configured for Standard workflows handling sensitive data
Conditional Access policies applied to the Entra ID app registration backing Easy Auth
Run history access restricted to authorised operations personnel

What comes next

The final post in this series concludes with operations: Application Insights integration, agent loop pricing, run history analysis, and deployment of agentic workflows through a CI/CD pipeline. Part 7 covers everything you need to run agent loops confidently in production.

Microsoft Foundry Citadel Platform Azure: A Practitioner’s Deployment Guide

Posted on June 24, 2026 by steefjan1970

Microsoft Foundry Citadel Platform on Azure is a layered AI governance architecture that delivers production-ready agent deployments with unified governance, end-to-end observability, and centralized policy enforcement via Azure API Management. It is still in preview, and the documentation assumes a degree of familiarity with Azure infrastructure that not everyone has on day one. This post walks through what it actually takes to get a working hub-and-spoke running in Sweden Central, including the pitfalls, so you can decide whether it is a viable starting point for your own AI platform journey.

What Citadel Is (and Is Not)

Before touching the tooling, it helps to understand what Citadel actually deploys. The architecture has four layers:

The first layer — Governance Hub is the runtime enforcement plane: Azure API Management as a centralized AI gateway, Azure API Center as a model registry, and supporting services for content safety, PII detection, cost attribution, and usage telemetry.

Subsequent second layer 2 — AI Control Plane provides observability via the Foundry Control Plane: agent-level execution traces, AI evaluations in development and production, red-teaming, drift monitoring, and fleet dashboards.

The next third layer — Agent Identity transforms agents into managed enterprise assets via Microsoft Entra ID, with lifecycle management, sponsorship models for human accountability, and shadow AI discovery.

Finally, the last fourth layer, 4 — Security Fabric, weaves Defender, Purview, and Entra across the other three layers for real-time threat intelligence, data governance, and compliance automation.

For this guide, we deploy Layer 1 (the Governance Hub via the AI Hub Gateway Solution Accelerator) and a Layer 1/2 spoke (via the AI Landing Zone Bicep). Layers 3 and 4 reference existing Azure services (Entra ID, Defender, Purview) that you integrate separately.

Important: Citadel is currently in preview. The repos, parameter schemas, and CLI commands will change. Treat everything in this post as a starting point, not a stable reference.

Prerequisites

Before you start, make sure you have:

An Azure subscription with Azure OpenAI access approved (aka.ms/oaiapply)
Microsoft.Authorization/roleAssignments/write on the subscription (Owner or User Access Administrator role)
Azure CLI installed and authenticated (az login)
Azure Developer CLI (azd) installed
Node.js — use v20 LTS, not v24. Node 24 on Windows has a known issue where npm bundles are incomplete, causing MODULE_NOT_FOUND errors on npm-cli.js and npm-prefix.js when azd tries to package Logic App components

If you run into npm issues on Windows, the cleanest workaround is Azure Cloud Shell, where Node, npm, az, and azd are all pre-installed and healthy.

Part 1: Deploying the Microsoft Foundry Citadel Governance Hub

Clone the AI Hub Gateway Solution Accelerator:

			
git clone https://github.com/Azure-Samples/ai-hub-gateway-solution-accelerator.git
cd ai-hub-gateway-solution-accelerator

Create your azd environment:

			
azd auth login
azd env new ai-hub-gateway-dev
azd env set AZURE_LOCATION swedencentral

Create a parameters file at infra/main.parameters.json. The key decisions:

Model versions matter. At the time of writing, gpt-4o-mini versions 2024-07-18 and 2024-10-18 are retired. Use gpt-4o version 2024-11-20 with GlobalStandard SKU. Always verify current model availability at aka.ms/aoai-regions before deploying these changes frequently.

			
{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "environmentName": { "value": "ai-hub-gateway-dev" },
    "location": { "value": "swedencentral" },
    "apimSku": { "value": "Developer" },
    "openAiInstances": {
      "value": {
        "openAi1": {
          "name": "openai1",
          "location": "swedencentral",
          "deployments": [
            {
              "name": "chat",
              "model": { "format": "OpenAI", "name": "gpt-4o", "version": "2024-11-20" },
              "sku": { "name": "GlobalStandard", "capacity": 20 }
            },
            {
              "name": "embedding",
              "model": { "format": "OpenAI", "name": "text-embedding-3-large", "version": "1" },
              "sku": { "name": "Standard", "capacity": 20 }
            }
          ]
        }
      }
    },
    "provisionFunctionApp": { "value": false },
    "createAppInsightsDashboard": { "value": false },
    "enableAIGatewayPiiRedaction": { "value": true },
    "enableAIModelInference": { "value": true }
  }
}

		

Deploy:

azd up

Expect 45–90 minutes. APIM Developer SKU is the slow component. If the deployment fails partway through, re-run azd up it is idempotent and will pick up where it left off.

Azure CLI output showing successful deployment of the Microsoft Foundry Citadel Governance Hub including APIM, Azure OpenAI chat and embedding model deployments, private endpoints, and Logic App in Sweden Central. — The AI Hub Gateway Solution Accelerator was deployed successfully in Azure Sweden Central after 21 hours and31 minutes, provisioning APIM, Azure OpenAI, Content Safety, Application Insights, private endpoints, and the usage processing Logic App.

Pitfall: Managed Identity Race Condition

You will likely see this error on first attempt:

BadRequest: The provided principal ID was not found in the AAD tenant(s)

This is a known race condition — the Managed Identity is created but has not yet propagated in Entra ID before the role assignment fires. Re-run azd up without any changes and it will succeed.

Validate the Hub

Once deployed, run:

azd env get-values | grep APIM

You will get your APIM gateway URL. Test it with a chat completion:

			
$headers = @{
  "Content-Type" = "application/json"
  "api-key" = "<YOUR_APIM_SUBSCRIPTION_KEY>"
}
$body = '{"messages":[{"role":"user","content":"Hello from the AI Hub Gateway!"}],"max_tokens":100}'
Invoke-RestMethod `
  -Uri "https://<your-apim>.azure-api.net/openai/deployments/chat/chat/completions?api-version=2024-02-01" `
  -Method POST -Headers $headers -Body $body

		

PowerShell output showing a successful chat completion response from the Microsoft Foundry Citadel APIM gateway in Azure Sweden Central, with content filter results, prompt filter results, and token usage confirmed. — Validating the Citadel Governance Hub by calling the APIM gateway endpoint via PowerShell, the response confirms gpt-4o-2024-11-20 routing, Content Safety filtering, PII redaction, and token usage tracking are all active.

A successful response with content_filter_results and prompt_filter_results confirms Content Safety and PII redaction are active. Token usage in the response confirms Cosmos DB is logging for cost attribution.

Part 2: Deploying a Citadel Platform Agent Spoke on Azure

The spoke is deployed from the AI Landing Zone Bicep repo. Download it as a ZIP (no GitHub account required):

			
https://github.com/Azure/bicep-ptn-aiml-landing-zone/archive/refs/heads/main.zip

Extract and navigate to the folder. Create a resource group for the spoke:

az group create --name rg-ai-spoke-dev --location swedencentral

Create a spoke.parameters.json file. Several things to know upfront:

The parameter schema is not the same as the Citadel README suggests. The actual template parameters differ from the example file. Key differences discovered in practice: aiFoundryLocation does not exist as a separate parameter; deployMcp, greenFieldDeployment, deployPostgres, and useCMK are not in this version of the template; and solutionStorageAccountName is simply storageAccountName.

The modelDeploymentList uses nested objects, not flat properties:

			
"modelDeploymentList": {
  "value": [
    {
      "name": "chat",
      "model": { "format": "OpenAI", "name": "gpt-4o", "version": "2024-11-20" },
      "sku": { "name": "GlobalStandard", "capacity": 20 },
      "canonical_name": "CHAT_DEPLOYMENT_NAME",
      "apiVersion": "2025-04-01-preview"
    },
    {
      "name": "text-embedding",
      "model": { "format": "OpenAI", "name": "text-embedding-3-large", "version": "1" },
      "sku": { "name": "Standard", "capacity": 10 },
      "canonical_name": "EMBEDDING_DEPLOYMENT_NAME",
      "apiVersion": "2025-04-01-preview"
    }
  ]
}

		

containerAppsList cannot be an empty array. The template references containerApps[0] internally and will fail validation if the array is empty. Pass at least one placeholder entry.

Deploy:

			
az deployment group create `
  --resource-group rg-ai-spoke-dev `
  --template-file main.bicep `
  --parameters @spoke.parameters.json

Pitfalls in the Spoke Deployment

AI Search Standard SKU capacity exhaustion. Sweden Central frequently runs out of AI Search Standard SKU capacity. You will see ResourcesForSkuUnavailable. This affects both the standalone Search Service and the AI Foundry Agent Service’s internal Search instance. Disable both:

			
"deploySearchService": { "value": false },
"deployAAfAgentSvc": { "value": false }

You can re-enable them later once capacity is available, or deploy Search in a different region.

Soft-deleted resources block redeployment. Azure retains soft-deleted Cognitive Services accounts, Key Vaults, and App Configuration stores for up to 90 days. If you delete a resource group and redeploy, the deployment will fail with FlagMustBeSetForRestore or NameUnavailable. Purge them explicitly before redeploying:

			
# List and purge soft-deleted resources
az keyvault list-deleted --subscription <sub-id> -o table
az keyvault purge --name <name> --location swedencentral
az appconfig list-deleted --subscription <sub-id> -o table
az appconfig purge --name <name> --location swedencentral --yes
az cognitiveservices account list-deleted --subscription <sub-id> -o table
az cognitiveservices account purge --name <name> --location swedencentral

		

Key Vault purges are slow — allow 2–5 minutes per vault.

Bastion subnet ID resolution fails with networkIsolation=false. When you disable network isolation, the template passes a relative subnet ID to Bastion instead of a fully qualified resource ID. Disable Bastion, Jump VM, and NAT Gateway for the dev spoke:

			
"deployBastion": { "value": false },
"deployJumpbox": { "value": false },
"deployVM": { "value": false },
"deployNatGateway": { "value": false }

Write parameters files without BOM. On Windows, Out-File -Encoding utf8 adds a Byte Order Mark that causes az deployment to fail with Unable to parse parameter. Use either:

			
$content | Out-File -FilePath "spoke.parameters.json" -Encoding utf8NoBOM
# or
[System.IO.File]::WriteAllText("spoke.parameters.json", $content, [System.Text.UTF8Encoding]::new($false))

Part 3: Wiring the Citadel Spoke to the Azure APIM Hub

Add the hub’s APIM gateway URL and subscription key to the spoke’s App Configuration:

			
az appconfig kv set `
  --name <spoke-appconfig-name> `
  --key "APIM_GATEWAY_URL" `
  --label "ai-lz" `
  --value "https://<your-apim>.azure-api.net/openai" `
  --yes
az appconfig kv set `
  --name <spoke-appconfig-name> `
  --key "APIM_SUBSCRIPTION_KEY" `
  --label "ai-lz" `
  --value "<YOUR_APIM_KEY>" `
  --yes

		

Note: az cognitiveservices account connection create with a YAML file for creating an APIM connection in AI Foundry has known bugs in the current CLI version and will throw NoneType or codec errors. Create this connection via the Azure AI Foundry portal UI instead.

Validate End-to-End

			
$headers = @{
  "Content-Type" = "application/json"
  "api-key" = "<YOUR_APIM_KEY>"
}
$body = '{"messages":[{"role":"user","content":"Hello from the Citadel spoke!"}],"max_tokens":50}'
Invoke-RestMethod `
  -Uri "https://<your-apim>.azure-api.net/openai/deployments/chat/chat/completions?api-version=2024-02-01" `
  -Method POST -Headers $headers -Body $body

		

A successful response with content_filter_results, prompt_filter_results, and usage confirms the full Citadel loop: spoke → APIM gateway → Azure OpenAI → governance telemetry.

PowerShell output showing a successful end-to-end chat completion from the Citadel agent spoke through the Azure APIM Governance Hub, confirming spoke to hub routing, content filter results, and token usage tracking in Sweden Central. — End-to-end validation of the Citadel hub-and-spoke setup: a request from the agent spoke routes through the APIM Governance Hub in Sweden Central, returning a successful gpt-4o response, with Content Safety filtering and token usage tracking confirmed.

What the Microsoft Foundry Citadel Platform Deploys

After following this guide, your rg-ai-hub-gateway-dev resource group contains:

APIM gateway with content safety, PII redaction, token rate limiting, and cost attribution policies
Azure OpenAI with gpt-4o and text-embedding-3-large
Cosmos DB for usage event logging
Logic App for usage processing
Application Insights for gateway telemetry

Your rg-ai-spoke-dev resource group contains:

AI Foundry account and project
gpt-4o and text-embedding-3-large deployments
Cosmos DB with a conversations container
Key Vault, App Configuration, Storage Account, Application Insights, Log Analytics

App Configuration is fully populated with canonical keys (CHAT_DEPLOYMENT_NAME, AI_FOUNDRY_PROJECT_ENDPOINT, COSMOS_DB_ENDPOINT, and more) ready for agent applications to consume.

This Is a Dev Setup — Here Is What Changes for Non-Prod and Production

The configuration above is a starting point, not a production blueprint. Key differences when moving up the environment stack:

APIM SKU. Developer SKU has no SLA and no VNet support. Switch to Premium SKU for non-prod and production. This significantly increases cost and deployment time but enables private networking, multi-region, and availability zones.

Network isolation. For production, set networkIsolation=true and wire the spoke VNet to your hub VNet via peering (hubIntegrationHubVnetResourceId). This requires coordinating private DNS zones across the hub and spoke. The template supports bringing existing DNS zones via the existingPrivateDnsZone* parameters.

AI Search. Re-enable deploySearchService and deployAAfAgentSvc for non-prod and production. If Sweden Central remains capacity-constrained on Standard SKU, deploy Search to a paired region (East US 2 works well) using the searchServiceLocation parameter.

Bastion and Jump VM. For production with networkIsolation=true, re-enable deployBastion and deployJumpbox so operators can access resources inside the private VNet without public endpoints.

Separate parameter files per environment. Maintain spoke.parameters.dev.json, spoke.parameters.nonprod.json, and spoke.parameters.prod.json with environment-specific values. Use a deployment pipeline (GitHub Actions or Azure DevOps) to apply them consistently.

Model versions. Pin specific model versions in parameters files and validate availability in your target region before each deployment. Azure OpenAI model lifecycle moves fast; versions retire on 18-month cycles, and regional availability varies.

Preview Caveats

Citadel is in active development. Several things you should expect to change:

The parameter schemas for both the hub and spoke accelerators will evolve. Parameters discovered missing or renamed in this guide will likely be reorganized again as the repos mature. Always check the actual main.bicep parameter definitions rather than relying on example files.

The az cognitiveservices account connection create CLI command for AI Foundry connections is incomplete at the time of writing. This will improve as the Foundry CLI surface area matures.

The citadel-v1 branch in the AI Hub Gateway repo is flagged as the recommended path for new deployments. By the time you read this, it may have become the default branch with a cleaner deployment experience.

Regional capacity for AI Search Standard SKU fluctuates. Sweden Central is a high-demand region for AI workloads plan for capacity constraints in any SKU beyond Basic for dev scenarios.

Conclusion

Citadel gives you a credible, opinionated starting point for enterprise AI governance on Azure APIM as the AI gateway, AI Foundry as the agent runtime, Cosmos DB for conversation state, and App Configuration as the configuration backbone. Getting it running today requires navigating several rough edges: parameter schema inconsistencies, soft-delete cascades, model version deprecations, regional capacity constraints, and Windows-specific tooling issues.

None of these are blockers. They are the expected friction of working with a platform in active preview. The underlying architecture is sound, and the pieces that do work, APIM governance policies, Content Safety integration, App Config population, and AI Foundry project wiring deliver real value immediately.

If you are building an AI platform for your organization, a Citadel dev setup is a reasonable first step. Treat it as a learning environment to understand the architecture, validate the tooling, and build the parameter files you will need for non-prod and production. Then evolve it deliberately: add network isolation, re-enable Search and Agent Services as capacity allows, and adopt the Citadel contracts (AI Access Contract, AI Publish Contract) to formalize the hub-spoke integration as your agent portfolio grows.

The governance-velocity paradox Citadel sets out to solve is real. Getting the foundation right now, while it is still in preview and the patterns are malleable, is the right time to start.

Final note: This post reflects a hands-on deployment performed in June 2026. Given the pace of change in this space, verify all CLI commands, parameter schemas, and model versions against current documentation before applying them in your own environment.

Multi-Agent Patterns in Azure Logic Apps: Handoffs, Orchestrators, and Sequential Loops

Posted on June 22, 2026 by steefjan1970

Part 5 of 7 in the Logic Apps Agent Loop series

Part 4 covered the three tooling layers available to an Azure Logic Apps agent. A single agent with well-defined tools handles a wide range of integration scenarios, but some workloads are too complex for one agent to handle well. Azure Logic Apps multi-agent patterns let you compose multiple agent loops into a coordinated system, where each agent has a single focused responsibility and the output of one feeds directly into the next. This post covers the four patterns Microsoft has defined and includes a working demo that builds a two-agent sequential loop.

This post covers the four patterns Microsoft has defined for multi-agent composition in Azure Logic Apps: prompt chaining, routing, handoff, and orchestrator-workers and includes a demo that builds a two-agent sequential loop: a triage agent that classifies a customer request and hands off to a specialist agent.

Why Azure Logic Apps multi-agent patterns matter

A single agent loop works well when the task is bounded and the instructions can cover every case. The problem comes when a task has multiple distinct phases that require different expertise, different tools, or different models. Packing all of that into one agent’s instructions creates a sprawling, hard-to-maintain prompt. The model has to context-switch between roles in a single loop, which degrades quality and makes the run history harder to interpret.

Multi-agent patterns solve this by giving each agent a single, clear responsibility. The agents are composed at the workflow level: one agent’s output becomes another agent’s input, and each agent can have its own model, its own tools, and its own focused instructions.

The four Azure Logic Apps multi-agent patterns explained

Microsoft’s documentation defines four patterns for multi-agent composition in Logic Apps. They are ordered by complexity.

Prompt chaining

The simplest pattern. A sequence of agent loops runs one after another, where the output of each loop becomes the input to the next. Each agent has a single focused task: extract, then format, then sort, then summarise. The chain is linear and predictable.

Use prompt chaining when the workload can be decomposed into sequential steps with clear handover points and when the output of each step is well-defined. A business report processing chain, raw data in, executive summary out, is the canonical example from the Microsoft documentation.

Routing

A classification agent examines the incoming request and routes it to one of several specialist agent loops based on what it finds. The routing agent does not do the work itself it decides which agent should do the work and passes control there.

Use routing when incoming requests fall into distinct categories that need different handling: a customer service triage agent that routes billing queries to a billing agent loop, technical questions to a technical support agent loop, and general inquiries to a general response agent loop. The routing pattern prevents optimization conflicts, allowing a billing specialist agent to be tuned for billing tasks without being distracted by technical support scenarios.

Handoff

Similar to routing but more dynamic. Instead of a central classifier making an upfront routing decision, each agent loop decides during its own execution whether it needs to hand off to another agent. The handoff preserves conversation context and state across the transition the receiving agent knows the full history of what the previous agent did and said.

Use handoff when the trigger for transferring control depends on what emerges during the conversation: a general support agent that escalates to a technical specialist when it detects a complex issue, or a research agent that hands off to a writer agent once it has gathered enough material. The handoff pattern mimics human escalation patterns: a front-line agent handles what it can and passes on what it cannot.

Orchestrator-workers

The most sophisticated pattern. A central orchestrator agent dynamically decomposes a task into subtasks and delegates each subtask to a worker agent loop. The worker agents operate as tools that the orchestrator can invoke, exactly the tool provider pattern from Part 4, applied to agents rather than connectors.

Use orchestrator-workers when you cannot predict the required subtasks in advance. A coding agent that needs to make changes to an unpredictable number of files, a research agent that gathers information from multiple dynamic sources, or a content pipeline with a writer, reviewer, and publisher working together, these are all orchestrator-worker scenarios. The orchestrator dynamically determines what needs to be done; the workers execute it.

Demo: Building a sequential agent loop — Extract and Summarise

This demo builds a two-agent prompt chaining workflow in a new sequential-agents workflow inside la-agent-loop. The scenario is a business report processing chain: Agent 1 extracts key facts and metrics from a raw text input, Agent 2 takes those facts and writes a concise executive summary. The output of Agent 1 feeds directly into Agent 2 — this is the prompt chaining pattern in its simplest form.

Prerequisites

The la-agent-loop Standard logic app from previous posts
An Azure OpenAI / Foundry Models connection already configured

Step 1: Create the workflow

In la-agent-loop, click Create and name the workflow sequential-agents. Select Autonomous Agents as the workflow type. Logic Apps creates the workflow with an HTTP trigger and an empty Agent action.

Step 2: Configure the HTTP trigger

Click the When an HTTP request is received trigger and paste this request body schema:

{ "type": "object", "properties": { "report": { "type": "string" } }, "required": ["report"] }

Step 3: Configure the Extract Agent

Click the first Agent action and rename it Extract Agent. Configure it:

AI model: your GPT-4o / Foundry Models connection
Instructions: You are a data extraction specialist. Extract all numerical values, metrics, and key facts from the provided text. Return them as a clean bulleted list. Do not summarise or interpret — only extract.
User instructions item – 1: select report from the HTTP trigger dynamic content

Step 4: Add a Compose action

This is a critical step. The Extract Agent output is a JSON object containing a messages array — not a plain string. The Summarize Agent cannot process it directly. A Compose action between the two agents extracts the plain text content.

Click + below the Extract Agent container and add Add an action → Simple Operations → Compose. Set the Inputs expression to:

outputs('Extract_Agent')?['body']?['messages'][0]['content']

This extracts the bulleted list text from the Extract Agent’s output object and passes it as a clean string to the next agent.

Step 5: Add the Summarize Agent

Click + below the Compose action and select Add an agent. Rename it Summarize Agent. Configure it:

AI model: your GPT-4o / Foundry Models connection
Instructions: You are an executive communications specialist. Take the provided list of facts and metrics and write a concise three-sentence executive summary suitable for a board report. Be professional and direct.
User instructions item – 1: select the Outputs of the Compose action from the dynamic content picker

Step 6: Add a Response action

Click + below the Summarize Agent container and add a Response action:

Status Code: 200
Content-Type header: application/json
Body: set the expression to outputs('Summarize_Agent')?['body']?['messages'][0]['content']

Azure Logic Apps designer showing the sequential-agents workflow. An HTTP request trigger connects to an Extract Agent action, followed by a Compose action that extracts the agent output content, then a Summarize Agent action, and finally a Response action that returns the executive summary to the caller. — Figure 1 — The complete sequential agent loop workflow in the Logic Apps designer. The Extract Agent receives the raw report text from the HTTP trigger and returns a bulleted list of facts. A Compose action bridges the two agents by extracting the plain text content from the Extract Agent’s JSON output object — a required intermediate step since Agent actions do not expose their output as a typed string in the dynamic content picker. The Summarize Agent receives the extracted facts and produces a three-sentence executive summary, which the Response action returns as a 200 OK.

Step 7: Save and test

Save the workflow and POST this to the trigger URL:

{ "report": "Q3 revenue was €4.2M, up 18% year on year. Customer acquisition cost dropped to €142, down from €198. Net promoter score reached 67. Headcount grew from 43 to 51. Churn rate fell to 2.3%." }

The workflow runs in approximately 16 seconds and returns a clean executive summary:

In Q3, revenue reached €4.2M, reflecting an 18% year-on-year increase, supported by a significant reduction in customer acquisition cost from €198 to €142. The company saw operational growth with headcount rising from 43 to 51, while maintaining strong customer satisfaction, evidenced by a Net Promoter Score of 67 and a low churn rate of 2.3%. These metrics highlight sustained growth and improved efficiency across key areas.

The run history shows two distinct agent iterations, Extract Agent and Summarize Agent, each with their own Think → Observe cycle, confirming the prompt chaining pattern is working end to end.

Logic Apps run history for the sequential-agents workflow completed in 7.37 seconds. The log shows the HTTP trigger, Extract Agent completing in 3.1 seconds, a Compose action at 0 seconds, Summarize Agent completing in 4 seconds, and a Response action at 0 seconds. The canvas on the right shows all five steps with green success indicators, with both agent actions showing iteration 1 of 2. — Figure 2 — The run history of the sequential agent loop, completed in 7.37 seconds. The Extract Agent ran for 3.1 seconds and passed its output to the Summarize Agent via the Compose action, which completed in 4 seconds. Both agent actions show iteration 1 of 2 on the canvas, confirming that each ran its own Think → Observe cycle independently. The Compose action completed in 0 seconds, serving purely as a data-transformation bridge between the two agent outputs.

Practitioner note: The Compose action between the two agents is not optional. Logic Apps Agent actions return a structured JSON object not a plain string, so the second agent cannot consume the first agent’s output directly from dynamic content. The Compose expression outputs('Extract_Agent')?['body']?['messages'][0]['content'] bridges this gap. This is not documented clearly by Microsoft at the time of writing and is the most common point of failure when building sequential agent loops.

Choosing the right pattern

Pattern	Complexity	Use when
Prompt chaining	Low	Sequential steps with clear handover points
Routing	Low–medium	Distinct input categories needing different handling
Handoff	Medium	Dynamic escalation based on conversation content
Orchestrator-workers	High	Unpredictable subtasks requiring dynamic decomposition

The patterns are not mutually exclusive. A production customer service system might use routing to direct initial requests, handoff for mid-conversation escalations, and prompt chaining within each specialist agent to process the request through multiple steps.

Diagram showing four Azure Logic Apps multi-agent patterns arranged in rows. Row 1: prompt chaining — Agent 1 Extract, Agent 2 Format, Agent 3 Summarise, Output. Row 2: routing — a Classifier triage agent routes to either a Billing agent or a Technical agent. Row 3: handoff — a General agent detects escalation and passes context via a dashed arrow to a Specialist agent with full history. Row 4: orchestrator-workers — an Orchestrator with dynamic breakdown fans out to Worker A, Worker B, and Worker C, which converge into a Synthesised output. Legend shows teal for agent/worker, purple for orchestrator/classifier, coral for specialist. — Figure 3 — The four multi-agent patterns available in Azure Logic Apps, ordered by complexity. Prompt chaining (top) runs agents sequentially, with each output feeding the next, as demonstrated in this post’s demo. Routing uses a classifier agent to direct requests to the right specialist. Handoff transfers control dynamically mid-conversation, preserving the full conversation history across the transition. Orchestrator-workers (bottom) is the most advanced pattern: a central orchestrator dynamically decomposes tasks and delegates them to worker agents, synthesizing their results into a final output.

What comes next

Part 6 covers securing agentic workflows, the expanded caller surface introduced by multi-agent and conversational patterns, Easy Auth setup for production, and Managed Identity for backend connections.

Azure API Management Build 2026 AI Gateway: What’s New

Posted on June 18, 2026 by steefjan1970

The Azure API Management Build 2026 AI gateway announcements mark a significant expansion of APIM’s control plane capabilities. Microsoft shipped three headline additions: a Unified Model API that lets clients standardize on one format while APIM transforms requests to Anthropic, Google Vertex AI, and other backends; content safety policies extended to cover MCP tool calls and agent-to-agent traffic; and expanded token metrics that now track reasoning, cached, and audio tokens across providers. This post explains what each change means in practice for teams building enterprise AI workloads on Azure.

Azure API Management Build 2026 AI Gateway: Three Headline Changes

The biggest announcement is the Unified Model API, now in public preview. It lets clients standardize on a single API format, currently OpenAI Chat Completions. At the same time, APIM transparently converts requests to the backend provider’s native format, whether that is Anthropic’s Messages API, Google Vertex AI, or another provider.

For teams running multi-model architectures, this is significant. Until now, switching providers or adding a new model required client-side changes. With the Unified Model API, the routing decision moves entirely to APIM. Teams can swap backends, add providers, or route traffic based on cost or latency without touching client code.

Diagram showing a client sending requests in OpenAI Chat Completions format to Azure API Management. APIM's Unified Model API layer transforms the request to each provider's native format — Azure OpenAI natively, Anthropic Messages API, and Google Vertex AI format — while applying governance policies and unified token metrics uniformly across all backends. A caption notes that client code is unchanged when swapping providers. — The APIM Unified Model API transformation layer. Clients standardize on a single API format, while APIM handles per-provider translation transparently. All governance policies, rate limits, content safety, and token metrics apply uniformly regardless of which provider handles inference. Teams can swap backends or add providers without touching client code.

From an architecture perspective, this strengthens the case for APIM as the single AI control plane. Every governance policy, rate limit, content safety, and token metric applies consistently regardless of which provider handles inference. There is no need for a parallel governance stack per provider.

One practical implication: the three-layer auth model from Part 2 of this series applies uniformly across all providers. Managed Identity to backend is the cleanest approach, but the provider must support it. For Anthropic and Vertex AI, check the current authentication requirements before assuming token-based auth transfers directly.

Content Safety for MCP and A2A: The Gap That Needed Closing

Extending the llm-content-safety policy to MCP tool calls and agent-to-agent payloads is the most architecturally significant change. Until now, content safety only covered LLM completions traffic. MCP tool-call arguments and A2A messages were ungoverned at the gateway layer.

This matters because prompt injection attacks do not only arrive via the user-facing chat interface. A malicious payload embedded in a tool response from an external MCP server, for example, can propagate through an agentic pipeline if there is no inspection at the gateway layer. The shield-prompt attribute specifically addresses this by checking for adversarial prompt-injection patterns in MCP and A2A traffic, not just in LLM input.

Side-by-side comparison diagram. On the left, before Build 2026, Azure API Management content safety covers only LLM completions traffic. MCP tool calls, agent-to-agent traffic, and prompt injection via tool responses are shown in red as ungoverned. On the right, after Build 2026, all four traffic types are shown in teal as covered — MCP tool call arguments, A2A agent payloads, and prompt injection attacks are now scanned by the llm-content-safety policy with the shield-prompt attribute enforced. — Content safety coverage before and after Build 2026. Prior to the announcement, the llm-content-safety policy only applied to LLM completions traffic. MCP tool-call arguments, agent-to-agent payloads, and prompt injection attacks arriving via tool responses were ungoverned at the gateway layer. The Build 2026 update closes all three gaps with the same policy, extended to cover MCP and A2A traffic.

One implementation detail worth calling out: the policy behaves differently for streaming responses. In non-streaming mode, a violation returns a clean 403. In streaming mode, the policy buffers events in a sliding window and stops forwarding without returning an explicit error code. Agents consuming streaming completions need to handle an abrupt stop gracefully. If you are designing agentic pipelines that use streaming, build in a timeout and an explicit error handling path for this case.

The two new attributes — window-size and window-overlap-size — let you tune how content exceeding Azure Content Safety’s 10,000 character limit is split for evaluation. For agentic pipelines with large tool responses, these will need tuning based on your typical payload sizes.

Expanded Token Metrics: Catching What Was Missing

The token metric policy from Part 4 of this series now logs reasoning tokens, cached tokens, and audio tokens to Application Insights. This is a meaningful improvement for FinOps visibility.

Reasoning models like o1 and o3 consume significant token budgets in their internal reasoning chain before producing output. Without reasoning token tracking, cross-charging dashboards systematically undercount consumption from teams using these models. The expanded metrics fix this.

Matrix diagram with token types as rows and AI providers as columns. Prompt tokens and completion tokens are tracked across all five providers: Azure OpenAI, Anthropic, Google Vertex AI, Amazon Bedrock, and Microsoft Foundry. Three new token types added at Build 2026 are highlighted in amber: reasoning tokens, tracked for Azure OpenAI, Anthropic, and Microsoft Foundry; cached tokens, tracked for Azure OpenAI, Anthropic, Google Vertex AI, and Microsoft Foundry; and audio tokens, tracked for Azure OpenAI only. Grey cells indicate token types not reported by a given provider. All data flows to Application Insights for FinOps dashboards and budget alerts. — Token metric coverage in Application Insights after Build 2026. The three amber rows — reasoning, cached, and audio tokens — are new additions. Reasoning token tracking is particularly significant for FinOps teams using o1 or o3 models, where the internal reasoning chain can consume a substantial portion of the total token budget that earlier metrics did not capture. Grey cells indicate that a provider does not expose that token type in its API response.

Cached token tracking is equally important for cost optimization. Azure OpenAI’s prompt caching reduces the cost of repeated prompt prefixes. Tracking cached vs. uncached tokens separately lets you measure the actual cache hit rate and tune your prompt structure accordingly.

The multi-provider coverage of Microsoft Foundry, OpenAI, Amazon Bedrock, and Google Vertex AI means the FinOps dashboard built in Part 4 now works across your entire model estate, not just Azure OpenAI.

API Center MCP Server: Enterprise Discovery at GA

The Azure API Center data plane MCP server reached general availability. It acts as a unified discovery endpoint: agents and developer tools can find registered MCP servers, tools, APIs, and AI assets through a single MCP connection. When a team registers a new MCP server in API Center, it becomes automatically discoverable without requiring individual client reconfigurations.

This is the enterprise catalogue layer that makes the MCP gateway story from Part 7 operationally sustainable at scale. Without it, discovery is a manual configuration problem. With it, the control plane extends automatically as new capabilities are registered.

Where This Leaves the Control Plane

Looking at the Build announcements together, the pattern is consistent with what the series argued: APIM is becoming the governance layer for all AI traffic, not just LLM completions. The Unified Model API extends it across providers. Content safety for MCP and A2A extends it across protocols. The API Center MCP server extends discovery to the enterprise catalogue layer.

The competitive context is worth noting. AWS Bedrock Guardrails handles content filtering but has no equivalent to the Unified Model API or MCP/A2A coverage. Google Apigee has added AI gateway features, but not at this protocol breadth. Cloudflare’s AI Gateway focuses on spend limits and caching. APIM’s position that the API gateway is the natural control plane for AI workloadsis increasingly defensible.

For teams that have followed the series and implemented the seven patterns, the Build announcements are additive rather than disruptive. The policy pipeline you built still works. The new capabilities slot in: swap your backend URL configuration to use the Unified Model API, add the llm-content-safety policy to your MCP server inbound pipeline, and update your Application Insights queries to include reasoning and cached token dimensions.

Lastly, the Microsoft AI Gateway labs‘ 30+ Jupyter notebooks with deployable Bicep templates are worth bookmarking if you are implementing any of these patterns.

Building Azure Logic Apps Agent Tools: Connectors and MCP

Posted on June 16, 2026 by steefjan1970

Part 4 of 7 in the Logic Apps Agent Loop series

Part 3 covered the two agentic workflow patterns in Azure Logic Apps, autonomous and conversational, and how to choose between them. Both patterns rely on the same mechanism for getting work done: tools. An Azure Logic Apps agent loop tool is the means by which the model reaches out to the world to query a database, send an email, call an API, or retrieve a document. Without tools, the agent can only reason over what the model already knows.

This post is the most hands-on in the series. It covers the three layers of the Azure Logic Apps tooling model, built-in connectors, custom connectors, and MCP servers. Moreover, it includes a demo showing how to expose a Logic Apps workflow as a tool provider that can be called by an external agent in Azure AI Foundry.

Choosing the right Azure Logic Apps agent tools layer

Before building anything, it is important first to understand what a tool actually is in Logic Apps terms. Specifically, a tool is defined as a sequence of one or more connector actions that the agent can choose to invoke during a loop iteration. Consequently, the model decides which tool to call based on the tool’s name and description. Therefore, naming and describing tools clearly is one of the most crucial decisions you will make when building an agentic workflow.

Logic Apps offers three layers of tooling, each adding capability and complexity.

Layer 1: Built-in and managed connectors

The foundation layer is the 1,400+ connector library that Logic Apps has always offered. For agent tools, the most relevant connectors are those that give the agent access to data and services: Azure OpenAI, Azure AI Search, Azure Blob Storage, Office 365 Outlook, SharePoint, SQL Server, HTTP, and Service Bus among them.

You build a tool by adding one or more of these connector actions inside the tool container within the agent action. Each tool gets a name and a description. The model reads these at runtime to decide whether to invoke the tool and what arguments to pass. You then create agent parameters for any action inputs that the model should supply dynamically: a city name for a weather lookup, a query string for a search, a recipient address for an email.

Agent parameters differ from standard Logic Apps parameters importantly. They are scoped to the tool where you define them; they cannot be shared across tools. They also receive their values only when the agent invokes the tool, not at workflow start time. You can call the same tool multiple times in a single loop using different parameter values: for example, you could invoke a weather tool for both Amsterdam and London in the same run.

Layer 2: Custom connectors

Where the built-in connector library has gaps, custom connectors fill them. A custom connector in Logic Apps is an OpenAPI-described wrapper around any REST API, internal or external. Furthermore, once you register it, it appears in the connector gallery just like a managed connector, and you can use it inside a tool in the same way.

For enterprise integration architects, custom connectors are the bridge between the agent loop and any internal system that does not have a first-party Logic Apps connector: an internal HR system, a legacy claims processing API, a proprietary data platform. The investment in defining the OpenAPI specification pays off because the connector becomes reusable across all workflows in the tenant, not just the agentic ones.

Building a custom connector for use in an agent tool follows the standard Logic Apps custom connector creation process:: define the API, specify authentication, and configure the operations, with one addition: write clear operation descriptions, because the model uses these descriptions to decide when to invoke the connector.

Layer 3: MCP servers

The third layer is the newest and the most architecturally significant. Azure Logic Apps can serve as the backend for a Model Context Protocol (MCP) server exposing connector actions as a structured, discoverable toolset that external agents and models can call over a standard protocol.

MCP is an open standard that defines how AI components discover and invoke tools. Moreover, an MCP server acts as a bridge between an AI agent and the tools it can use. This is a significant shift from the previous two layers. Built-in and custom connectors are tools that the agent in your Logic Apps workflow invokes. An MCP server inverts the relationship: your Logic Apps workflow becomes the tool provider, and the calling agent lives somewhere else entirely.

Structural diagram showing three tooling layers for Azure Logic Apps agentic workflows. Layer 1 contains built-in and managed connectors including Azure OpenAI, Azure AI Search, Office 365, HTTP, and 1,400 more. Layer 2 shows custom connectors wrapping internal REST APIs such as HR, claims, and ERP systems. Both layers sit inside the Logic App boundary with an agent parameters note. Layer 3 sits below as a separate MCP server section, showing an external agent connecting via Azure API Center to MCP tools backed by Logic Apps connectors. — Figure 1 — The three tooling layers available to an Azure Logic Apps agent. Layer 1 (purple) covers the 1,400+ built-in and managed connectors packaged as tools directly inside the agent action. Layer 2 (coral) adds custom connectors that wrap internal REST APIs not covered by first-party connectors, reusable across the tenant. Both layers follow the same pattern: the agent in your workflow calls the tool. Layer 3 (purple, below) inverts the relationship — your Standard logic app becomes the tool provider, registered through Azure API Center and callable by any external MCP-compatible agent. Agent parameters apply across all three layers: the model supplies tool input values at runtime, scoped per tool.

A note on the demo: real-world limitations of the tooling preview

For this post I set out to build a working end-to-end demo showing a Logic Apps workflow exposed as an MCP tool provider callable by an Azure AI Foundry agent. The concept is sound and the architecture is correct, but two practical blockers prevented a clean demo at the time of writing.

API Center MCP wizard limitations. The registration wizard in Azure API Center is in active preview. The connector picker surfaces only managed connectors, so the built-in HTTP action from Part 2 is unavailable. The logic app dropdown is also filtered by region, a logic app in West Europe will not appear in an API Center resource deployed to a different region.

Foundry OpenAPI tool network restrictions. Azure AI Foundry’s OpenAPI tool sandbox cannot reach azurewebsites.net endpoints directly. Calls from the Foundry playground return an Unknown error regardless of the spec configuration. The workaround is to front the Logic Apps endpoint with Azure API Management, which Foundry can reach however that adds infrastructure complexity beyond the scope of this post.

Both limitations are preview-stage issues that Microsoft will likely resolve. The OpenAPI spec, the Foundry agent configuration, and the mcp-research workflow pattern described above are all correct and will work once network access between Foundry and Logic Apps endpoints is available or via an APIM gateway.

The Layer 3 pattern of your Logic App as a tool provider for any external MCP-compatible agent remains the most architecturally significant development in this series. In addition, Part 6 picks up the security implications of that expanded caller surface.

Choosing the right tooling layer

The table below summarises how Azure Logic Apps agentic workflows differ across the three tooling layers.

	Built-in connectors	Custom connectors	MCP server
Who calls the tool	Agent in your workflow	Agent in your workflow	Any external MCP-compatible agent
Setup complexity	Low	Medium	Medium–high
Reusability	Within the workflow	Across the tenant	Across agents and platforms
Best for	Standard integrations	Internal APIs without a connector	Multi-agent, cross-platform tooling

The three layers are not mutually exclusive. A production agentic workflow will typically use built-in connectors for standard integrations, custom connectors for internal systems, and an MCP server where the toolset needs to be shared across multiple agents or platforms.

What comes next

The next post moves from individual tools to multi-agent composition. Part 5 covers orchestrator-worker topologies, agent handoffs, and how to build sequential agent loops.

Autonomous vs Conversational Agentic Workflows in Logic Apps

Posted on June 9, 2026 by steefjan1970

Part 3 of 7 in the Logic Apps Agent Loop series

Part 2 walked through the anatomy of an Azure Logic Apps agent loop and built a minimal autonomous agent from scratch. Before opening the designer, though, there is a design decision to make as Azure Logic Apps agentic workflows come in two patterns: autonomous and conversational, and choosing the right one shapes the trigger, the prompt source, the output destination, and the authentication you need before going to production. This post covers both patterns and helps you decide which fits your scenario.

Two Azure Logic Apps agentic workflow patterns, one agent loop

Both autonomous and conversational agentic workflows use the same Azure Logic Apps agent loop under the hood, the same Think, Act, Observe cycle from Post 2, the same connected model, the same tools built from connector actions. The differences arise from how the workflow starts, who supplies the prompts, and how the results get delivered.

Autonomous agentic workflows

Supported Logic Apps triggers include an HTTP request, a timer, a Service Bus message, a new file in Blob Storage, and an email arriving in an inbox. The trigger fires, outputs the agent’s prompt, runs the loop, and then returns the result to the caller or forwards it to a downstream system. No human is in the loop during execution.

This is the pattern from Post 2. It works well in scenarios where the input is clear, and the agent’s task is specific: summarize this document, classify this support ticket, extract these fields from this invoice, and route this order based on its contents. The workflow runs unattended, potentially thousands of times a day, without any human interaction between trigger and result.

The key design characteristic of an autonomous workflow is that the prompt comes from the system, not from a person. The trigger outputs a message body, a file name, and a queue payload, which is what the agent reasons over. The instructions you write in the agent’s configuration pane define the agent’s role for every run.

Conversational agentic workflows

A conversational agentic workflow introduces a human in the loop. Instead of firing from a system trigger, it always starts with the “When a chat session starts” trigger the only trigger supported for this pattern. From there, the agent receives prompts through an integrated chat interface: a person types a message, the agent reasons over it, invokes tools if needed, and responds. The conversation continues turn by turn until the session ends.

This pattern suits scenarios that require dialogue: a support agent that asks clarifying questions, a guided data-entry flow, a research assistant that refines its output based on feedback, or any situation where the right response depends on what the user says next. The agent maintains session state across turns, so each prompt it receives includes the history of the conversation so far.

The integrated chat interface is accessible directly from the Logic Apps designer in the Azure portal during development. For production use, conversational workflows also support an external chat client that people outside the portal can access, which introduces authentication requirements covered later in this post.

Choosing the right Azure Logic Apps agentic workflow pattern

The decision comes down to one question: does the workflow need a human in the loop during execution?

If the input is fully available at trigger time and the task can be completed without further human input, use the autonomous pattern. If the workflow needs to ask questions, receive feedback, or maintain a conversation across multiple turns, use the conversational pattern.

A few other factors are worth considering:

Trigger flexibility. Autonomous workflows support any Logic Apps trigger, the full library of 1,400+ connectors. Conversational workflows are locked to the When a chat session starts trigger. If your scenario requires a scheduled run, a queue-based trigger, or any event-driven start, autonomous is your only option.

Output destination. Autonomous agents return results to the workflow caller or pass them to a downstream action, an email, a queue message, or a database write. Conversational agents respond through the chat interface. If the output needs to go somewhere other than a chat window, autonomous is the right fit.

Authentication complexity. Autonomous workflows authenticate using the same patterns as any other Logic Apps workflow, Managed Identity, SAS tokens, and Easy Auth. Conversational workflows that expose an external chat client face a broader authentication challenge: callers can come from dynamic, unknown, or untrusted networks, and every external caller must be authenticated and authorized before going to production. During development, the Azure portal provides a developer key for quick testing in the designer, but this key is explicitly not suitable for production use.

State management. Conversational workflows maintain conversation history across turns automatically. Autonomous workflows have no concept of a session — each run is independent. If your scenario needs memory across multiple interactions, the conversational pattern handles this natively.

What changes in the designer for Logic Apps

Setting up Azure Logic Apps agentic workflows in the designer follows the same steps for both patterns, with two key differences.

When you create a new workflow, select Conversational Agents instead of Autonomous Agents as the workflow type. Logic Apps creates the workflow with the When a chat session starts trigger already in place and an empty agent action connected to it.

The second difference is the chat interface itself. Once the workflow is saved, a chat panel is accessible from the designer toolbar. During development, this is where you test the agent interactively, type a prompt, read the response, andcontinue the conversation. The run history records each turn as a separate agent iteration, giving you the same visibility into the loop’s behaviour as in an autonomous workflow.

Authentication for conversational workflows in production

The developer key that the Azure portal uses during design and testing is a convenience mechanism tied to your portal session. It is not a substitute for production authentication. The developer key is not designed for large or untrusted caller populations, is not governed by Conditional Access policies at the request execution layer, and cannot be distributed externally.

For production conversational agentic workflows, you need to set up Easy Auth on the Logic App.This section addresses external callers, who include individuals or agents accessing the chat endpoint from outside the Azure portal. It emphasizes the need to use proper identity-based authentication for this access. In Post 6 of this series, we will delve deeper into the complete security landscape concerning agentic workflows. This includes a detailed discussion on setting up Easy Auth, utilizing Managed Identity for backend connections, and evaluating the broader threat model associated with conversational workflows.

Choosing the right pattern: a quick reference

	Autonomous	Conversational
Trigger	Any supported trigger	When a chat session starts only
Human interaction	None during execution	Turn-by-turn via chat interface
Prompt source	Trigger or preceding action output	Human input through chat
Output destination	Caller, downstream action, or system	Chat interface response
Session state	None — each run is independent	Maintained across turns
External access	Standard Logic Apps auth	Requires Easy Auth for production
Best for	Unattended, event-driven tasks	Dialogue, guided flows, multi-turn tasks

Azure Logic Apps supports two agentic workflow patterns: autonomous and conversational. This post explains how they differ in trigger, prompt source, output, and authentication and helps you decide which pattern fits your scenario. — Figure 1 — Autonomous agentic workflows (left) accept input from any supported Logic Apps trigger and run without human interaction, returning results to a caller or downstream system. Conversational agentic workflows (right) always start with the When a chat session starts trigger, receive prompts from a human through the integrated chat interface, and maintain session state across turns. Both patterns use the same agent loop mechanics: Think, Act, Observe, but differ in trigger flexibility, prompt source, output destination, and production authentication requirements.

What comes next

The next post moves from pattern selection to tooling. The upcoming part 4 covers how to build tools for the agent, from built-in and custom connectors to MCP servers as tool providers, and includes the most hands-on demo in the series.

Cloud Perspectives

Steef-Jan Wiggers

Author Archives: steefjan1970

About steefjan1970

Autonomous vs Conversational Agentic Workflows in Logic Apps

Two Azure Logic Apps agentic workflow patterns, one agent loop

Autonomous agentic workflows

Conversational agentic workflows

Choosing the right Azure Logic Apps agentic workflow pattern

What changes in the designer for Logic Apps

Authentication for conversational workflows in production

Choosing the right pattern: a quick reference

What comes next