Citadel APIM Kill Switch: Stop a Governed Agent Cold

Posted on July 14, 2026 by steefjan1970

In the previous post we added conversation persistence to the Microsoft Foundry Citadel Platform on Azure. As a result, every agent run now produces a structured document in the spoke’s Cosmos DB conversations container. The agent is fully operational: it routes through the APIM governance hub, executes tool calls, stores its history, and returns grounded responses. However, the question that every enterprise AI architect eventually faces remains: what happens when it needs to stop?

Not a graceful shutdown. Not a redeployment. An immediate, operator-triggered containment the kind you need when an agent is behaving unexpectedly, consuming runaway tokens, or has been flagged by your security team. In a Microsoft Foundry Citadel Platform on Azure deployment, the answer is the Kill Switch: a layered containment system built into the APIM hub that stops agent traffic cold without touching the agent code, the spoke, or the Azure OpenAI deployment.

This post implements three of the five Citadel kill switch layers against the hub we deployed in Sweden Central:

Layer 1 — Named Value flip: instant global block via a single boolean
Layer 2 — JWT claim block: identity-based containment via header validation
Layer 3 — Agent ID blocklist: surgical per-agent blocking

The Scenario

The weather agent (agent_with_memory.py) is running in production. Specifically, it is routing through apim-wpvlimv4ngkns.azure-api.net, storing conversations in the spoke Cosmos DB, and generating token usage events in the hub Cosmos DB. Everything is working. Then your security team flags it. The agent needs to stop immediately while the incident is investigated. You have seconds, not minutes. For example, redeploying the spoke takes too long. Rotating the APIM subscription key is irreversible and affects all consumers. Therefore, the Kill Switch is the right tool.

The Kill Switch is the right tool. The APIM hub has built-in pre-wiring, requires no code changes, and can trigger actions in under 30 seconds.

To ensure reliability, always pre-wire the kill switch as Layer 1 before you need it. Remember, you can’t flip a Named Value that doesn’t exist. In addition, the inbound policy must already be in place, checking the Named Value on every request, before any incident occurs.

Prerequisites

From the previous posts you should have:

Hub deployed in rg-ai-hub-gateway-dev with APIM instance apim-wpvlimv4ngkns
agent_with_memory.py running and saving to Cosmos DB
Azure CLI authenticated

Citadel Kill Switch Layer 1 — Named Value Flip

How It Works

A Named Value called kill-switch-enabled is created in APIM and set to false. An inbound policy on the OpenAI API checks this value on every request. When the value is flipped to true, all requests through the gateway immediately return HTTP 403 — no code changes, no redeployment, no spoke involvement.

Step 1.1 — Create the Named Value

			
az apim nv create `
  --resource-group rg-ai-hub-gateway-dev `
  --service-name apim-wpvlimv4ngkns `
  --named-value-id kill-switch-enabled `
  --display-name "kill-switch-enabled" `
  --value "false" `
  --secret false

		

Verify it was created:

			
az apim nv show `
  --resource-group rg-ai-hub-gateway-dev `
  --service-name apim-wpvlimv4ngkns `
  --named-value-id kill-switch-enabled `
  --query "value" -o tsv

		

Should return false.

Step 1.2 — Add the Inbound Policy

In the Azure Portal:

Navigate to apim-wpvlimv4ngkns → APIs → Azure OpenAI Service API → All operations
Click Policies → Inbound processing → Edit
Add this policy inside the <inbound> section, before any other policies:

			
<!-- Kill Switch Layer 1: Named Value flip -->
<set-variable name="killSwitchActive" value="@("{{kill-switch-enabled}}" == "true")" />
<choose>
    <when condition="@((bool)context.Variables["killSwitchActive"])">
        <return-response>
            <set-status code="403" reason="Agent Suspended" />
            <set-header name="Content-Type" exists-action="override">
                <value>application/json</value>
            </set-header>
            <set-header name="x-kill-switch-layer" exists-action="override">
                <value>1-named-value</value>
            </set-header>
            <set-body>{"error": {"code": "KillSwitchActive", "message": "Agent access has been suspended by the governance hub.", "layer": 1}}</set-body>
        </return-response>
    </when>
</choose>

		

The dedicated Named Value check policy provides a cleaner approach.

			
<!-- Kill Switch Layer 1: Named Value flip -->
<!-- Pre-wire this BEFORE any incident. Set kill-switch-enabled=true to activate. -->
<set-variable name="killSwitchActive" value="@("{{kill-switch-enabled}}" == "true")" />
<choose>
  <when condition="@((bool)context.Variables["killSwitchActive"])">
    <return-response>
      <set-status code="403" reason="Agent Suspended" />
      <set-header name="Content-Type" exists-action="override">
        <value>application/json</value>
      </set-header>
      <set-header name="x-kill-switch-layer" exists-action="override">
        <value>1-named-value</value>
      </set-header>
      <set-body>{"error": {"code": "KillSwitchActive", "message": "Agent access has been suspended by the governance hub. Contact your administrator.", "layer": 1}}</set-body>
    </return-response>
  </when>
</choose>

		

Click Save.

Step 1.3 — Confirm Agent Runs Normally

With kill-switch-enabled set to false, the agent should still work:

python agent_with_memory.py

Expected output: normal run, conversation saved, answer returned.

Step 1.4 — Trigger the Kill Switch

			
az apim nv update `
  --resource-group rg-ai-hub-gateway-dev `
  --service-name apim-wpvlimv4ngkns `
  --named-value-id kill-switch-enabled `
  --value "true"

		

Now run the agent:

python agent_with_memory.py

Expected output:

The agent stops. No spoke changes occur. No code changes happen. One CLI command executes.

Step 1.5 — Reset

			
az apim nv update `
  --resource-group rg-ai-hub-gateway-dev `
  --service-name apim-wpvlimv4ngkns `
  --named-value-id kill-switch-enabled `
  --value "false"

		

Citadel Kill Switch Layer 2 — Agent Approval Header

How It Works

The agent is required to pass a custom header x-agent-token containing a signed JWT with a specific claim (agt-approved: true). The APIM inbound policy validates this claim. If the claim is absent or the token is invalid, the system blocks the request with a 401 status. This action simulates identity-based containment, revoking the agent’s token or invalidating its claim at the identity provider level.

Step 2.1 — Update the Agent to Send a Header

Add the x-agent-token header to agent_with_memory.py. In this demo, we simulate the token by using a simple header value. IIn a production environment, Entra ID issues a JWT.

Modify the AzureOpenAI client creation in agent_with_memory.py:

			
client = AzureOpenAI(
    azure_endpoint=apim_base,
    api_key=cfg["APIM_SUBSCRIPTION_KEY"],
    api_version="2024-02-01",
    default_headers={
        "x-agent-id": "citadel-weather-agent-v1"
    }
)

		

Step 2.2 — Add the JWT Claim Check Policy

In the portal, add this policy after the Layer 1 block in the inbound section:

			
<!-- Kill Switch Layer 2: Agent approval header check -->
<choose>
  <when condition="@(context.Request.Headers.GetValueOrDefault("x-agent-approved", "false") != "true")">
    <return-response>
      <set-status code="401" reason="Agent Not Approved" />
      <set-header name="Content-Type" exists-action="override">
        <value>application/json</value>
      </set-header>
      <set-header name="x-kill-switch-layer" exists-action="override">
        <value>2-agent-approval</value>
      </set-header>
      <set-body>{"error": {"code": "AgentNotApproved", "message": "Agent identity could not be verified. Approval header missing or invalid.", "layer": 2}}</set-body>
    </return-response>
  </when>
</choose>

		

Step 2.3 — Trigger Layer 2

Remove the x-agent-approved header from the agent (or set it to false) and run:

python agent_with_memory.py

Expected output:

Note the response header x-kill-switch-layer: 2-agent-approval this indicates which containment layer fired and is critical for incident triage.

Pitfall: Policy Order Matters

Layer 1 must appear before Layer 2 in the policy document. APIM evaluates inbound policies top to bottom and stops at the first <return-response>. If Layer 2 appears before Layer 1, a globally suspended agent would return a 401 (identity error) instead of a 403 (suspended), obscuring the true containment reason in your incident log.

Citadel Kill Switch Layer 3 — Agent ID Blocklist in APIM

How It Works

A Named Value called blocked-agent-ids holds a comma-separated list of agent IDs. The inbound policy checks the x-agent-id header against this list. When agents match, the system blocks them with a 403 status code. Non-matching agents continue operating normally. This approach allows for surgical containment, stopping one specific agent while allowing all others to function.

Step 3.1 — Create the Blocklist Named Value

			
az apim nv create `
  --resource-group rg-ai-hub-gateway-dev `
  --service-name apim-wpvlimv4ngkns `
  --named-value-id blocked-agent-ids `
  --display-name "blocked-agent-ids" `
  --value "none" `
  --secret false

		

Start with an empty value — no agents blocked.

Step 3.2 — Add the Blocklist Policy

Add this policy after Layer 2 in the inbound section:

			
<!-- Kill Switch Layer 3: Agent ID blocklist -->
<set-variable name="agentId" value="@(context.Request.Headers.GetValueOrDefault("x-agent-id", ""))" />
<set-variable name="blockedIds" value="@("{{blocked-agent-ids}}")" />
<choose>
    <when condition="@{
        var agentId = (string)context.Variables["agentId"];
        var blockedIds = (string)context.Variables["blockedIds"];
        if (string.IsNullOrEmpty(agentId) || string.IsNullOrEmpty(blockedIds)) { return false; }
        return blockedIds.Split(',').Any(id => id.Trim() == agentId.Trim());
    }">
        <return-response>
            <set-status code="403" reason="Agent Blocked" />
            <set-header name="Content-Type" exists-action="override">
                <value>application/json</value>
            </set-header>
            <set-header name="x-kill-switch-layer" exists-action="override">
                <value>3-agent-blocklist</value>
            </set-header>
            <set-body>{"error": {"code": "AgentBlocked", "message": "Agent has been added to the governance blocklist.", "layer": 3}}</set-body>
        </return-response>
    </when>
</choose>

		

Step 3.3 — Add the Agent to the Blocklist

			
az apim nv update `
  --resource-group rg-ai-hub-gateway-dev `
  --service-name apim-wpvlimv4ngkns `
  --named-value-id blocked-agent-ids `
  --value "citadel-weather-agent-v1"

		

Run the agent:

python agent_with_memory.py

Expected output:

Step 3.4 — Surgical Validation

The power of Layer 3 is specificity. If you had a second agent with a different x-agent-id say citadel-docs-agent-v1 it would pass through Layer 3 unaffected while citadel-weather-agent-v1 remains blocked. One agent stopped, all others running. This is the enterprise AI governance pattern: granular control without broad disruption.

Remove the agent from the blocklist:

			
az apim nv update --resource-group rg-ai-hub-gateway-dev
--service-name apim-wpvlimv4ngkns --named-value-id blocked-agent-ids
--value "none"

Validating the Citadel Kill Switch in Application Insights

Rather than using the CLI — which has a 5–10 minute Log Analytics ingestion lag — go directly to Application Insights in the portal for immediate results:

Portal → appi-apim-wpvlimv4ngkns in rg-ai-hub-gateway-dev
Left sidebar → Logs
Paste and run this query:

			
requests
| where timestamp > ago(2h)
| where resultCode in ("200", "401", "403")
| project timestamp, resultCode, duration, name
| order by timestamp desc

		

The results table tells the complete kill switch story in two columns — resultCode and duration:

Azure Application Insights Logs results table showing POST /openai/deployments/chat/chat/completions requests — two 401 responses at 41ms and 0.9ms from Layer 2 kill switch activation, and four 200 responses at 683ms to 2010ms from normal governed agent runs through the Microsoft Foundry Citadel APIM hub in Sweden Central. — Application Insights Logs query on the Citadel APIM hub showing the kill switch in action, 401 responses at under 1ms confirm Layer 2 (agent approval header) blocking requests at the gateway before any LLM call is made, contrasted with normal 200 responses taking 683ms–2010ms for a full Azure OpenAI round trip.

The duration contrast is the definitive proof that the kill switch works as designed. The 401s and 403s resolve in under 50ms, stopped cold at the APIM inbound policy before a single token is sent to Azure OpenAI. The 200s take 683ms–2010ms because they made the full round trip through the governance hub to Azure OpenAI and back.

Zero tokens consumed on blocked requests, zero cost, and zero Cosmos DB writes in the spoke. The agent is stopped at the perimeter.

For a sharper view that highlights exactly which kill switch layer fired on each blocked request, add the response header to the query. Unfortunately APIM response headers are not automatically projected into the requests table in Application Insights — but you can distinguish the layers by combining result code and timing:

			
requests
| where timestamp > ago(2h)
| where resultCode in ("200", "401", "403")
| extend killSwitchLayer = case(
    resultCode == "401", "Layer 2 — agent approval",
    resultCode == "403" and duration < 10, "Layer 1 or 3 — gateway block",
    resultCode == "200", "Normal — LLM call completed",
    "Unknown"
  )
| project timestamp, resultCode, duration, killSwitchLayer
| order by timestamp desc

		

Azure Application Insights Logs results table for the Citadel APIM hub showing six requests — two 401 responses labeled Layer 2 agent approval at 41ms and 0.9ms duration, and four 200 responses labeled Normal LLM call completed at 683ms to 2010ms duration, confirming the Microsoft Foundry Citadel kill switch blocks requests at the gateway before any Azure OpenAI call is made. — Application Insights Logs query on the Citadel APIM hub showing the kill switch incident log — Layer 2 agent approval blocks resolving in under 1ms with zero LLM calls made, contrasted with normal governed runs completing in 683ms–2010ms. The killSwitchLayer column identifies exactly which containment layer fired on each request.

This gives you a readable incident log showing which containment layer was active at each point in time, directly useful for DORA incident post-mortem documentation and EU AI Act Article 17 risk management records.

The Complete Three-Layer Kill Switch Policy

Here is the complete inbound policy block containing all three layers, ready to paste into APIM:

 <!-- Kill Switch Layer 1: Named Value flip -->
        <set-variable name="killSwitchActive" value="@("{{kill-switch-enabled}}" == "true")" />
        <choose>
            <when condition="@((bool)context.Variables["killSwitchActive"])">
                <return-response>
                    <set-status code="403" reason="Agent Suspended" />
                    <set-header name="Content-Type" exists-action="override">
                        <value>application/json</value>
                    </set-header>
                    <set-header name="x-kill-switch-layer" exists-action="override">
                        <value>1-named-value</value>
                    </set-header>
                    <set-body>{"error": {"code": "KillSwitchActive", "message": "Agent access has been suspended by the governance hub.", "layer": 1}}</set-body>
                </return-response>
            </when>
        </choose>
        <!-- Kill Switch Layer 2: Agent approval header -->
        <choose>
            <when condition="@(context.Request.Headers.GetValueOrDefault("x-agent-approved", "false") != "true")">
                <return-response>
                    <set-status code="401" reason="Agent Not Approved" />
                    <set-header name="Content-Type" exists-action="override">
                        <value>application/json</value>
                    </set-header>
                    <set-header name="x-kill-switch-layer" exists-action="override">
                        <value>2-agent-approval</value>
                    </set-header>
                    <set-body>{"error": {"code": "AgentNotApproved", "message": "Agent identity could not be verified. Approval header missing or invalid.", "layer": 2}}</set-body>
                </return-response>
            </when>
        </choose>
        <!-- Kill Switch Layer 3: Agent ID blocklist -->
        <set-variable name="agentId" value="@(context.Request.Headers.GetValueOrDefault("x-agent-id", ""))" />
        <set-variable name="blockedIds" value="@("{{blocked-agent-ids}}")" />
        <choose>
            <when condition="@{
        var agentId = (string)context.Variables["agentId"];
        var blockedIds = (string)context.Variables["blockedIds"];
        if (string.IsNullOrEmpty(agentId) || string.IsNullOrEmpty(blockedIds)) { return false; }
        return blockedIds.Split(',').Any(id => id.Trim() == agentId.Trim());
    }">
                <return-response>
                    <set-status code="403" reason="Agent Blocked" />
                    <set-header name="Content-Type" exists-action="override">
                        <value>application/json</value>
                    </set-header>
                    <set-header name="x-kill-switch-layer" exists-action="override">
                        <value>3-agent-blocklist</value>
                    </set-header>
                    <set-body>{"error": {"code": "AgentBlocked", "message": "Agent has been added to the governance blocklist.", "layer": 3}}</set-body>
                </return-response>
            </when>
        </choose>

Pitfalls Summary

Pitfall	Fix
Named Value doesn’t exist at incident time	Pre-wire Layer 1 during normal operations — never during an incident
Policy evaluation error on `{{kill-switch-enabled}}`	Named Value must exist before the policy referencing it is saved
Layer 2 fires before Layer 1 in policy	Policy order matters — Layer 1 must be first in the inbound block
Agent ID header not sent	Add `x-agent-id` to `default_headers` in `AzureOpenAI` client
Blocklist with trailing spaces blocks nothing	Use `.Trim()` in the policy C# expression when splitting
Kill switch left active after test	Always reset Named Values after testing — `kill-switch-enabled=false`, `blocked-agent-ids=""`

What the Kill Switch Demonstrates About Citadel

The three layers reveal something important about the Citadel architecture: governance lives in the hub, not the agent. The agent code has no knowledge of the kill switch. The spoke has no kill switch configuration. The Azure OpenAI deployment is untouched. All containment logic is in the APIM hub’s inbound policy — one place, centrally managed, instantly effective.

This is the enterprise AI control plane pattern in practice. When an incident occurs:

Layer 1 stops everything immediately while you triage
Layer 2 enforces identity verification once normal operations resume
Layer 3 surgically targets the offending agent while other agents continue

The x-kill-switch-layer response header ensures your incident log captures exactly which containment mechanism fired, giving you a clean audit trail for post-mortem analysis — directly relevant for DORA incident reporting and EU AI Act Article 17 risk management documentation.

What’s Next

The next post in this series takes the dev setup and hardens it for non-prod: networkIsolation=true, APIM Premium SKU, per-spoke subscription keys with independent quotas, and Azure Policy at the management group level. The kill switch policies we built here carry forward unchanged governance in the hub environment, which is environment-agnostic.

Microsoft Foundry Citadel Platform Azure: Connecting a Tool-Calling Agent

Posted on July 1, 2026 by steefjan1970

In the previous post we deployed a working Microsoft Foundry Citadel Platform on Azure Sweden Central, a Governance Hub built on Azure API Management and an Agent Spoke built on Azure AI Foundry. We validated the setup with a raw chat completion call through the APIM gateway. That proved the plumbing works. This post takes the next step: connecting a real tool-calling agent to the Microsoft Foundry Citadel Platform on Azure, using the Open-Meteo weather API as a tool, and showing that every LLM call flows through the hub’s governance layer.

The agent is built with the standard Azure OpenAI SDK pointed directly at the Citadel APIM gateway. It uses a custom function tool that calls the Open-Meteo API to retrieve real current weather data for any location. The governance hub intercepts all traffic: content safety policies fire, token usage is tracked, and telemetry flows into Application Insights. This is the Microsoft Foundry Citadel Platform doing what it is designed to do.

What We Build

The flow looks like this:

Flow diagram showing a Python agent using the OpenAI SDK making an initial request to the Citadel APIM gateway, which forwards to Azure OpenAI gpt-4o. The model requests a tool call to get_weather, the agent calls the Open-Meteo API for real weather data, submits the result back through APIM for a second LLM call, receives a grounded response, and telemetry flows to Application Insights and Cosmos DB. — End-to-end flow of a tool-calling agent on the Microsoft Foundry Citadel Platform in Azure Sweden Central, the Python agent routes both LLM calls through the APIM Governance Hub, executes the get_weather tool against Open-Meteo, and receives a grounded response, with all traffic captured in Application Insights and Cosmos DB.

Two LLM calls flow through APIM per agent run: the tool decision call and the synthesis call. Both are governed, appear in Application Insights, and contribute to Cosmos DB usage tracking.

Why Open-Meteo and Why the Standard OpenAI SDK

The original plan was to use the Azure AI Foundry Agent Service SDK with Bing Search grounding. Two blockers emerged:

Bing Search SKU eligibility: The Grounding with Bing Search resource (G1 SKU) requires Pay-As-You-Go or EA subscriptions and is not available on MVP or MSDN subscriptions.

AI Foundry Agent Service routing: The azure-ai-projects SDK routes LLM calls through the AI Foundry project’s internal endpoint (aif-tggi2gmkw22w4.openai.azure.com) rather than through APIM, bypassing the governance layer. In addition, even after adding APIM as a connected resource in the AI Foundry portal, the Agent Service does not honor it for model routing in the current preview version.

The solution, therefore, is to use the standard OpenAI Python SDK pointed directly at the APIM gateway endpoint. This guarantees that all traffic flows through the hub; consequently, the tool-calling loop is implemented explicitly in Python, and the governance telemetry is fully captured in Application Insights.

Open-Meteo is a free, open-source weather API; therefore, it requires no API key and returns structured JSON weather data. Additionally, it serves as a clean stand-in for any external API your agents might call in production.

Prerequisites

From the previous post you should have:

Hub deployed in rg-ai-hub-gateway-dev with APIM gateway URL https://apim-wpvlimv4ngkns.azure-api.net and subscription key
Spoke deployed in rg-ai-spoke-dev with App Config appcs-tggi2gmkw22w4 containing APIM_GATEWAY_URL and APIM_SUBSCRIPTION_KEY
Your principal ID with App Configuration Data Reader role on the spoke App Config

For this post you additionally need Python 3.11 or later installed locally.

Step 1 — Set Up the Python Environment

			
mkdir citadel-agent && cd citadel-agent
python -m venv .venv
# Windows
.venv\Scripts\activate
pip install openai
pip install azure-appconfiguration
pip install azure-identity
pip install requests

		

Step 2 — Read Configuration from App Config

Create config.py using Set-Content to avoid BOM issues on Windows:

			
$lines = @(
    "from azure.appconfiguration import AzureAppConfigurationClient",
    "from azure.identity import DefaultAzureCredential",
    "",
    "APP_CONFIG_ENDPOINT = 'https://appcs-tggi2gmkw22w4.azconfig.io'",
    "LABEL = 'ai-lz'",
    "",
    "def get_config() -> dict:",
    "    credential = DefaultAzureCredential()",
    "    client = AzureAppConfigurationClient(",
    "        base_url=APP_CONFIG_ENDPOINT,",
    "        credential=credential",
    "    )",
    "    keys = [",
    "        'AI_FOUNDRY_PROJECT_ENDPOINT',",
    "        'CHAT_DEPLOYMENT_NAME',",
    "        'APIM_GATEWAY_URL',",
    "        'APIM_SUBSCRIPTION_KEY',",
    "    ]",
    "    config = {}",
    "    for key in keys:",
    "        setting = client.get_configuration_setting(key=key, label=LABEL)",
    "        config[key] = setting.value",
    "    return config",
    "",
    "if __name__ == '__main__':",
    "    cfg = get_config()",
    "    for k, v in cfg.items():",
    "        print(f'{k}: {v[:30]}...')"
)
[System.IO.File]::WriteAllLines("$PWD\config.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Test it:

python config.py

All four keys should return truncated values. If you get a 403, wait 2–5 minutes for role assignment propagation and retry.

Pitfall: Always Use WriteAllLines for Python Files on Windows

Out-File -Encoding utf8NoBOM and @"..."@ | Out-File both add a BOM on some Windows PowerShell versions, causing Python to throw SyntaxError: Non-UTF-8 code starting with '\xff'. Use [System.IO.File]::WriteAllLines with [System.Text.UTF8Encoding]::new($false) to write files without BOM.

Step 3 — Define the Weather Tool

Create tools.py:

			
$lines = @(
    "import json",
    "import requests",
    "",
    "def get_weather(location: str) -> str:",
    "    try:",
    "        geo = requests.get(",
    "            'https://geocoding-api.open-meteo.com/v1/search',",
    "            params={'name': location, 'count': 1, 'language': 'en', 'format': 'json'},",
    "            timeout=10",
    "        )",
    "        geo.raise_for_status()",
    "        geo_data = geo.json()",
    "        if not geo_data.get('results'):",
    "            return json.dumps({'error': f'Location not found: {location}'})",
    "        r = geo_data['results'][0]",
    "        weather = requests.get(",
    "            'https://api.open-meteo.com/v1/forecast',",
    "            params={'latitude': r['latitude'], 'longitude': r['longitude'], 'current_weather': True, 'wind_speed_unit': 'kmh', 'timezone': 'auto'},",
    "            timeout=10",
    "        )",
    "        weather.raise_for_status()",
    "        c = weather.json()['current_weather']",
    "        codes = {0:'Clear sky',1:'Mainly clear',2:'Partly cloudy',3:'Overcast',45:'Foggy',61:'Slight rain',63:'Moderate rain',65:'Heavy rain',71:'Slight snow',80:'Showers',95:'Thunderstorm'}",
    "        return json.dumps({'location': f'{r[chr(110)+(chr(97)+chr(109)+chr(101))]}, {r.get(chr(99)+chr(111)+chr(117)+chr(110)+chr(116)+chr(114)+chr(121),chr(32))}', 'temperature_celsius': c['temperature'], 'wind_speed_kmh': c['windspeed'], 'wind_direction_degrees': c['winddirection'], 'condition': codes.get(c['weathercode'],'Unknown'), 'is_day': bool(c['is_day'])})",
    "    except Exception as e:",
    "        return json.dumps({'error': str(e)})",
    "",
    "WEATHER_TOOL_DEFINITION = {",
    "    'type': 'function',",
    "    'function': {",
    "        'name': 'get_weather',",
    "        'description': 'Get current weather for a location. Returns temperature in Celsius, wind speed, condition.',",
    "        'parameters': {",
    "            'type': 'object',",
    "            'properties': {'location': {'type': 'string', 'description': 'City name e.g. Stockholm'}},",
    "            'required': ['location']",
    "        }",
    "    }",
    "}"
)
[System.IO.File]::WriteAllLines("$PWD\tools.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Test it:

python -c "from tools import get_weather; print(get_weather('Stockholm'))"

Step 4 — Create the Agent

Create agent.py using the standard openai SDK pointed directly at the APIM gateway:

			
$lines = @(
    "import json",
    "from openai import AzureOpenAI",
    "from config import get_config",
    "from tools import get_weather, WEATHER_TOOL_DEFINITION",
    "",
    "def run_agent(user_question: str) -> str:",
    "    cfg = get_config()",
    "",
    "    # Strip /openai suffix - AzureOpenAI SDK adds it automatically",
    "    apim_base = cfg['APIM_GATEWAY_URL'].rstrip('/').replace('/openai', '')",
    "",
    "    client = AzureOpenAI(",
    "        azure_endpoint=apim_base,",
    "        api_key=cfg['APIM_SUBSCRIPTION_KEY'],",
    "        api_version='2024-02-01',",
    "    )",
    "",
    "    messages = [{'role': 'user', 'content': user_question}]",
    "    print(f'Sending request via APIM: {apim_base}')",
    "",
    "    # First LLM call - agent decides whether to use the tool",
    "    response = client.chat.completions.create(",
    "        model=cfg['CHAT_DEPLOYMENT_NAME'],",
    "        messages=messages,",
    "        tools=[WEATHER_TOOL_DEFINITION],",
    "        tool_choice='auto',",
    "    )",
    "",
    "    msg = response.choices[0].message",
    "    messages.append(msg)",
    "",
    "    # Handle tool calls if the agent decided to use get_weather",
    "    if msg.tool_calls:",
    "        for tool_call in msg.tool_calls:",
    "            args = json.loads(tool_call.function.arguments)",
    "            print(f'  -> Tool call: get_weather({args})')",
    "            result = get_weather(**args)",
    "            print(f'  -> Tool result: {result}')",
    "            messages.append({",
    "                'role': 'tool',",
    "                'tool_call_id': tool_call.id,",
    "                'content': result,",
    "            })",
    "",
    "        # Second LLM call - synthesise grounded response",
    "        response = client.chat.completions.create(",
    "            model=cfg['CHAT_DEPLOYMENT_NAME'],",
    "            messages=messages,",
    "        )",
    "        return response.choices[0].message.content",
    "",
    "    return msg.content",
    "",
    "if __name__ == '__main__':",
    "    question = 'What is the weather like in Stockholm right now?'",
    "    print(f'Question: {question}')",
    "    answer = run_agent(question)",
    "    print(f'Answer: {answer}')"
)
[System.IO.File]::WriteAllLines("$PWD\agent.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Run it:

python agent.py

A successful run looks like this:

Pitfall: APIM Endpoint Format

The AzureOpenAI SDK constructs the full path as {azure_endpoint}/openai/deployments/{model}/chat/completions. If your APIM_GATEWAY_URL in App Config contains /openai at the end, strip it before passing to the client; otherwise, the SDK builds a doubled path (/openai/openai/...) that returns a 500 from APIM. The line apim_base = cfg['APIM_GATEWAY_URL'].rstrip('/').replace('/openai', '') handles this automatically.

After running the agent, check Application Insights in the hub:

			
az monitor app-insights query `
  --app <Your APIM instance Name> `
  --resource-group rg-ai-hub-gateway-dev `
  --analytics-query "requests | where timestamp > ago(10m) | project timestamp, name, resultCode, duration | order by timestamp desc" `
  --output table

		

Pitfall: CLI vs Portal Ingestion Lag

The CLI query hits the Log Analytics store; however, it has a 5–10-minute ingestion lag. In contrast, the Azure Portal Application Insights blade uses a live metrics path and shows results immediately. Therefore, if the CLI returns an empty response, it’s a good idea to check the portal directly, go to the APIM instance → Performance to view requests in real time.

What Governed Traffic Looks Like in the Portal

The Application Insights Performance blade shows two operation types per agent run:

azure-openai-service-api:rev=1 - ChatCompletions_Create — the APIM policy-matched operation, showing the governed calls with content safety applied
POST /openai/openai/deployments/chat/chat/completions — the raw endpoint calls

Each agent run generates two successful requests (tool decision + synthesis), both with response code 200 and latency around 900ms–1.2s for gpt-4o. Failed attempts from earlier endpoint format issues show as 500s and are clearly distinguishable.

Azure Application Insights Performance blade showing 9 requests to the Citadel APIM gateway including the azure-openai-service-api ChatCompletions_Create operation with 6 calls at 1.09 seconds average and POST /openai/deployments/chat/chat/completions with 3 calls, confirming the tool-calling agent traffic flows through the Microsoft Foundry Citadel governance hub in Sweden Central. — Application Insights Performance blade for the Citadel Governance Hub, confirming agent traffic routed through APIM: 9 requests captured, with the governed ChatCompletions_Create operation averaging 1.09 seconds, and all successful calls returning response code 200.

The Azure AI Foundry Agent Service SDK — What We Learned

For completeness, here is a summary of what we discovered when attempting to use the azure-ai-projects SDK before switching to the standard OpenAI SDK:

Issue	Detail
`FunctionTool` import path	Must import from `azure.ai.agents.models`, not `azure.ai.projects.models`
`create_thread` does not exist	Use `create_thread_and_process_run` instead
`list_messages` does not exist	Use `client.agents.messages.list(thread_id=...)`
`MessageRole.ASSISTANT` does not exist	Use the string `"assistant"` directly
`enable_auto_function_calls(toolset=...)` fails	Parameter is `tools=`, not `toolset=`
Function not found error	Call `client.agents.enable_auto_function_calls(tools=toolset)` before `create_agent`
Agent traffic bypasses APIM	AI Foundry Agent Service uses its own endpoint resolution — use standard OpenAI SDK pointed at APIM instead

The Agent Service SDK is in active beta development (azure-ai-agents==1.2.0b6 at the time of writing). Expect these APIs to stabilise and the APIM routing issue to be addressed in future versions.

Pitfalls Summary

Pitfall	Fix
Grounding with Bing Search G1 SKU not eligible	Requires Pay-As-You-Go or EA subscription
`Bing.Search.v7` CLI creation fails	Resource type moved to `Microsoft.Bing/accounts`
BOM in Python files on Windows	Use `[System.IO.File]::WriteAllLines` with `UTF8Encoding($false)`
APIM endpoint doubles `/openai` path	Strip `/openai` from URL before passing to `AzureOpenAI` client
App Config 403 on first run	Wait 2–5 minutes for role assignment propagation
CLI Application Insights query empty	5–10 minute ingestion lag — check portal Performance blade instead
AI Foundry Agent Service bypasses APIM	Use standard `openai` SDK pointed directly at APIM gateway

What the Full Citadel Loop Delivers

With the agent running through APIM, every LLM call in the tool-calling loop is governed:

Content Safety — both the user question and the synthesised response pass through Azure AI Content Safety policies configured in APIM.

Token tracking — each of the two LLM calls contributes to the token usage log in Cosmos DB, giving you per-call cost attribution by APIM subscription key. The Cosmos DB ai-usage-container in the hub captures a structured document for each LLM call, including the model version, token counts, gateway region, request IP, APIM subscription name, backend routing, and timestamp. In production, the productName field maps to the APIM subscription key. Aggregating documents by this field gives you direct FinOps reporting per AI initiative.

Azure Cosmos DB Data Explorer showing a usage event document in the ai-usage-container of the Citadel hub, with fields including model gpt-4o-2024-11-20, promptTokens 17, responseTokens 53, totalTokens 70, gatewayRegion Sweden Central, productName Portal-Admin, and timestamp 6/24/2026, confirming token tracking and cost attribution via the Microsoft Foundry Citadel APIM governance hub. — The Citadel hub Cosmos DB ai-usage-container showing a usage document captured from the tool-calling agent run model gpt-4o-2024-11-20, 70 total tokens, gateway region Sweden Central, routed via apim-wpvlimv4ngkns. Every LLM call through APIM generates a document like this, which serves as the cost attribution and audit trail for enterprise AI governance.

Latency observability — Application Insights captures the duration of every call, making it easy to identify slow tool calls or model latency spikes.

Audit trail — every request is logged with timestamp, operation name, response code, and duration. For a healthcare or financial services context, this is your compliance evidence.

What’s Next

This post wires a tool-calling agent to the Citadel hub using the standard OpenAI SDK. The natural next steps:

Azure AI Foundry Agent Service routing — as the SDK matures, the azure-ai-projects client will likely gain proper APIM gateway support. Watch the azure-ai-agents release notes for updates on connection-based routing.

Conversation persistence — store conversation history in the Cosmos DB conversations container already deployed in the spoke. The App Config key CONVERSATIONS_DATABASE_CONTAINER points to it.

Network isolation — re-enable networkIsolation=true in the spoke parameters to route all traffic through private endpoints.

Multiple tools — extend the agent with additional function tools (document lookup, product catalog, claims system) using the same pattern. Each tool call flows through APIM and is governed identically.

Conclusion

Connecting a real tool-calling agent to the Microsoft Foundry Citadel Platform on Azure requires three components: the standard OpenAI SDK configured to point to the APIM gateway, a function tool with a JSON schema definition, and an explicit tool-call-handling loop. Everything else, governance, content safety, token tracking, and cost attribution, is handled by the Citadel hub automatically.

The path to get here involved navigating several SDK beta rough edges and discovering that the AI Foundry Agent Service bypasses APIM in its current preview form. These are expected friction points with a platform in active development. The governance architecture underneath is sound, the APIM policies work, and the Application Insights telemetry confirms it.

Two LLM calls. Both governed. Both visible. That is what the Citadel hub delivers.

Cloud Perspectives

Steef-Jan Wiggers

Tag Archives: Citadel