AI Architecture Archives - Cloud PerspectivesCloud Perspectives

In the previous post we deployed a working Microsoft Foundry Citadel Platform on Azure Sweden Central, a Governance Hub built on Azure API Management and an Agent Spoke built on Azure AI Foundry. We validated the setup with a raw chat completion call through the APIM gateway. That proved the plumbing works. This post takes the next step: connecting a real tool-calling agent to the Microsoft Foundry Citadel Platform on Azure, using the Open-Meteo weather API as a tool, and showing that every LLM call flows through the hub’s governance layer.

The agent is built with the standard Azure OpenAI SDK pointed directly at the Citadel APIM gateway. It uses a custom function tool that calls the Open-Meteo API to retrieve real current weather data for any location. The governance hub intercepts all traffic: content safety policies fire, token usage is tracked, and telemetry flows into Application Insights. This is the Microsoft Foundry Citadel Platform doing what it is designed to do.

What We Build

The flow looks like this:

Flow diagram showing a Python agent using the OpenAI SDK making an initial request to the Citadel APIM gateway, which forwards to Azure OpenAI gpt-4o. The model requests a tool call to get_weather, the agent calls the Open-Meteo API for real weather data, submits the result back through APIM for a second LLM call, receives a grounded response, and telemetry flows to Application Insights and Cosmos DB. — End-to-end flow of a tool-calling agent on the Microsoft Foundry Citadel Platform in Azure Sweden Central, the Python agent routes both LLM calls through the APIM Governance Hub, executes the get_weather tool against Open-Meteo, and receives a grounded response, with all traffic captured in Application Insights and Cosmos DB.

Two LLM calls flow through APIM per agent run: the tool decision call and the synthesis call. Both are governed, appear in Application Insights, and contribute to Cosmos DB usage tracking.

Why Open-Meteo and Why the Standard OpenAI SDK

The original plan was to use the Azure AI Foundry Agent Service SDK with Bing Search grounding. Two blockers emerged:

Bing Search SKU eligibility: The Grounding with Bing Search resource (G1 SKU) requires Pay-As-You-Go or EA subscriptions and is not available on MVP or MSDN subscriptions.

AI Foundry Agent Service routing: The azure-ai-projects SDK routes LLM calls through the AI Foundry project’s internal endpoint (aif-tggi2gmkw22w4.openai.azure.com) rather than through APIM, bypassing the governance layer. In addition, even after adding APIM as a connected resource in the AI Foundry portal, the Agent Service does not honor it for model routing in the current preview version.

The solution, therefore, is to use the standard OpenAI Python SDK pointed directly at the APIM gateway endpoint. This guarantees that all traffic flows through the hub; consequently, the tool-calling loop is implemented explicitly in Python, and the governance telemetry is fully captured in Application Insights.

Open-Meteo is a free, open-source weather API; therefore, it requires no API key and returns structured JSON weather data. Additionally, it serves as a clean stand-in for any external API your agents might call in production.

Prerequisites

From the previous post you should have:

Hub deployed in rg-ai-hub-gateway-dev with APIM gateway URL https://apim-wpvlimv4ngkns.azure-api.net and subscription key
Spoke deployed in rg-ai-spoke-dev with App Config appcs-tggi2gmkw22w4 containing APIM_GATEWAY_URL and APIM_SUBSCRIPTION_KEY
Your principal ID with App Configuration Data Reader role on the spoke App Config

For this post you additionally need Python 3.11 or later installed locally.

Step 1 — Set Up the Python Environment

			
mkdir citadel-agent && cd citadel-agent
python -m venv .venv
# Windows
.venv\Scripts\activate
pip install openai
pip install azure-appconfiguration
pip install azure-identity
pip install requests

		

Step 2 — Read Configuration from App Config

Create config.py using Set-Content to avoid BOM issues on Windows:

			
$lines = @(
    "from azure.appconfiguration import AzureAppConfigurationClient",
    "from azure.identity import DefaultAzureCredential",
    "",
    "APP_CONFIG_ENDPOINT = 'https://appcs-tggi2gmkw22w4.azconfig.io'",
    "LABEL = 'ai-lz'",
    "",
    "def get_config() -> dict:",
    "    credential = DefaultAzureCredential()",
    "    client = AzureAppConfigurationClient(",
    "        base_url=APP_CONFIG_ENDPOINT,",
    "        credential=credential",
    "    )",
    "    keys = [",
    "        'AI_FOUNDRY_PROJECT_ENDPOINT',",
    "        'CHAT_DEPLOYMENT_NAME',",
    "        'APIM_GATEWAY_URL',",
    "        'APIM_SUBSCRIPTION_KEY',",
    "    ]",
    "    config = {}",
    "    for key in keys:",
    "        setting = client.get_configuration_setting(key=key, label=LABEL)",
    "        config[key] = setting.value",
    "    return config",
    "",
    "if __name__ == '__main__':",
    "    cfg = get_config()",
    "    for k, v in cfg.items():",
    "        print(f'{k}: {v[:30]}...')"
)
[System.IO.File]::WriteAllLines("$PWD\config.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Test it:

python config.py

All four keys should return truncated values. If you get a 403, wait 2–5 minutes for role assignment propagation and retry.

Pitfall: Always Use WriteAllLines for Python Files on Windows

Out-File -Encoding utf8NoBOM and @"..."@ | Out-File both add a BOM on some Windows PowerShell versions, causing Python to throw SyntaxError: Non-UTF-8 code starting with '\xff'. Use [System.IO.File]::WriteAllLines with [System.Text.UTF8Encoding]::new($false) to write files without BOM.

Step 3 — Define the Weather Tool

Create tools.py:

			
$lines = @(
    "import json",
    "import requests",
    "",
    "def get_weather(location: str) -> str:",
    "    try:",
    "        geo = requests.get(",
    "            'https://geocoding-api.open-meteo.com/v1/search',",
    "            params={'name': location, 'count': 1, 'language': 'en', 'format': 'json'},",
    "            timeout=10",
    "        )",
    "        geo.raise_for_status()",
    "        geo_data = geo.json()",
    "        if not geo_data.get('results'):",
    "            return json.dumps({'error': f'Location not found: {location}'})",
    "        r = geo_data['results'][0]",
    "        weather = requests.get(",
    "            'https://api.open-meteo.com/v1/forecast',",
    "            params={'latitude': r['latitude'], 'longitude': r['longitude'], 'current_weather': True, 'wind_speed_unit': 'kmh', 'timezone': 'auto'},",
    "            timeout=10",
    "        )",
    "        weather.raise_for_status()",
    "        c = weather.json()['current_weather']",
    "        codes = {0:'Clear sky',1:'Mainly clear',2:'Partly cloudy',3:'Overcast',45:'Foggy',61:'Slight rain',63:'Moderate rain',65:'Heavy rain',71:'Slight snow',80:'Showers',95:'Thunderstorm'}",
    "        return json.dumps({'location': f'{r[chr(110)+(chr(97)+chr(109)+chr(101))]}, {r.get(chr(99)+chr(111)+chr(117)+chr(110)+chr(116)+chr(114)+chr(121),chr(32))}', 'temperature_celsius': c['temperature'], 'wind_speed_kmh': c['windspeed'], 'wind_direction_degrees': c['winddirection'], 'condition': codes.get(c['weathercode'],'Unknown'), 'is_day': bool(c['is_day'])})",
    "    except Exception as e:",
    "        return json.dumps({'error': str(e)})",
    "",
    "WEATHER_TOOL_DEFINITION = {",
    "    'type': 'function',",
    "    'function': {",
    "        'name': 'get_weather',",
    "        'description': 'Get current weather for a location. Returns temperature in Celsius, wind speed, condition.',",
    "        'parameters': {",
    "            'type': 'object',",
    "            'properties': {'location': {'type': 'string', 'description': 'City name e.g. Stockholm'}},",
    "            'required': ['location']",
    "        }",
    "    }",
    "}"
)
[System.IO.File]::WriteAllLines("$PWD\tools.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Test it:

python -c "from tools import get_weather; print(get_weather('Stockholm'))"

Step 4 — Create the Agent

Create agent.py using the standard openai SDK pointed directly at the APIM gateway:

			
$lines = @(
    "import json",
    "from openai import AzureOpenAI",
    "from config import get_config",
    "from tools import get_weather, WEATHER_TOOL_DEFINITION",
    "",
    "def run_agent(user_question: str) -> str:",
    "    cfg = get_config()",
    "",
    "    # Strip /openai suffix - AzureOpenAI SDK adds it automatically",
    "    apim_base = cfg['APIM_GATEWAY_URL'].rstrip('/').replace('/openai', '')",
    "",
    "    client = AzureOpenAI(",
    "        azure_endpoint=apim_base,",
    "        api_key=cfg['APIM_SUBSCRIPTION_KEY'],",
    "        api_version='2024-02-01',",
    "    )",
    "",
    "    messages = [{'role': 'user', 'content': user_question}]",
    "    print(f'Sending request via APIM: {apim_base}')",
    "",
    "    # First LLM call - agent decides whether to use the tool",
    "    response = client.chat.completions.create(",
    "        model=cfg['CHAT_DEPLOYMENT_NAME'],",
    "        messages=messages,",
    "        tools=[WEATHER_TOOL_DEFINITION],",
    "        tool_choice='auto',",
    "    )",
    "",
    "    msg = response.choices[0].message",
    "    messages.append(msg)",
    "",
    "    # Handle tool calls if the agent decided to use get_weather",
    "    if msg.tool_calls:",
    "        for tool_call in msg.tool_calls:",
    "            args = json.loads(tool_call.function.arguments)",
    "            print(f'  -> Tool call: get_weather({args})')",
    "            result = get_weather(**args)",
    "            print(f'  -> Tool result: {result}')",
    "            messages.append({",
    "                'role': 'tool',",
    "                'tool_call_id': tool_call.id,",
    "                'content': result,",
    "            })",
    "",
    "        # Second LLM call - synthesise grounded response",
    "        response = client.chat.completions.create(",
    "            model=cfg['CHAT_DEPLOYMENT_NAME'],",
    "            messages=messages,",
    "        )",
    "        return response.choices[0].message.content",
    "",
    "    return msg.content",
    "",
    "if __name__ == '__main__':",
    "    question = 'What is the weather like in Stockholm right now?'",
    "    print(f'Question: {question}')",
    "    answer = run_agent(question)",
    "    print(f'Answer: {answer}')"
)
[System.IO.File]::WriteAllLines("$PWD\agent.py", $lines, [System.Text.UTF8Encoding]::new($false))

		

Run it:

python agent.py

A successful run looks like this:

Pitfall: APIM Endpoint Format

The AzureOpenAI SDK constructs the full path as {azure_endpoint}/openai/deployments/{model}/chat/completions. If your APIM_GATEWAY_URL in App Config contains /openai at the end, strip it before passing to the client; otherwise, the SDK builds a doubled path (/openai/openai/...) that returns a 500 from APIM. The line apim_base = cfg['APIM_GATEWAY_URL'].rstrip('/').replace('/openai', '') handles this automatically.

After running the agent, check Application Insights in the hub:

			
az monitor app-insights query `
  --app <Your APIM instance Name> `
  --resource-group rg-ai-hub-gateway-dev `
  --analytics-query "requests | where timestamp > ago(10m) | project timestamp, name, resultCode, duration | order by timestamp desc" `
  --output table

		

Pitfall: CLI vs Portal Ingestion Lag

The CLI query hits the Log Analytics store; however, it has a 5–10-minute ingestion lag. In contrast, the Azure Portal Application Insights blade uses a live metrics path and shows results immediately. Therefore, if the CLI returns an empty response, it’s a good idea to check the portal directly, go to the APIM instance → Performance to view requests in real time.

What Governed Traffic Looks Like in the Portal

The Application Insights Performance blade shows two operation types per agent run:

azure-openai-service-api:rev=1 - ChatCompletions_Create — the APIM policy-matched operation, showing the governed calls with content safety applied
POST /openai/openai/deployments/chat/chat/completions — the raw endpoint calls

Each agent run generates two successful requests (tool decision + synthesis), both with response code 200 and latency around 900ms–1.2s for gpt-4o. Failed attempts from earlier endpoint format issues show as 500s and are clearly distinguishable.

Azure Application Insights Performance blade showing 9 requests to the Citadel APIM gateway including the azure-openai-service-api ChatCompletions_Create operation with 6 calls at 1.09 seconds average and POST /openai/deployments/chat/chat/completions with 3 calls, confirming the tool-calling agent traffic flows through the Microsoft Foundry Citadel governance hub in Sweden Central. — Application Insights Performance blade for the Citadel Governance Hub, confirming agent traffic routed through APIM: 9 requests captured, with the governed ChatCompletions_Create operation averaging 1.09 seconds, and all successful calls returning response code 200.

The Azure AI Foundry Agent Service SDK — What We Learned

For completeness, here is a summary of what we discovered when attempting to use the azure-ai-projects SDK before switching to the standard OpenAI SDK:

Issue	Detail
`FunctionTool` import path	Must import from `azure.ai.agents.models`, not `azure.ai.projects.models`
`create_thread` does not exist	Use `create_thread_and_process_run` instead
`list_messages` does not exist	Use `client.agents.messages.list(thread_id=...)`
`MessageRole.ASSISTANT` does not exist	Use the string `"assistant"` directly
`enable_auto_function_calls(toolset=...)` fails	Parameter is `tools=`, not `toolset=`
Function not found error	Call `client.agents.enable_auto_function_calls(tools=toolset)` before `create_agent`
Agent traffic bypasses APIM	AI Foundry Agent Service uses its own endpoint resolution — use standard OpenAI SDK pointed at APIM instead

The Agent Service SDK is in active beta development (azure-ai-agents==1.2.0b6 at the time of writing). Expect these APIs to stabilise and the APIM routing issue to be addressed in future versions.

Pitfalls Summary

Pitfall	Fix
Grounding with Bing Search G1 SKU not eligible	Requires Pay-As-You-Go or EA subscription
`Bing.Search.v7` CLI creation fails	Resource type moved to `Microsoft.Bing/accounts`
BOM in Python files on Windows	Use `[System.IO.File]::WriteAllLines` with `UTF8Encoding($false)`
APIM endpoint doubles `/openai` path	Strip `/openai` from URL before passing to `AzureOpenAI` client
App Config 403 on first run	Wait 2–5 minutes for role assignment propagation
CLI Application Insights query empty	5–10 minute ingestion lag — check portal Performance blade instead
AI Foundry Agent Service bypasses APIM	Use standard `openai` SDK pointed directly at APIM gateway

What the Full Citadel Loop Delivers

With the agent running through APIM, every LLM call in the tool-calling loop is governed:

Content Safety — both the user question and the synthesised response pass through Azure AI Content Safety policies configured in APIM.

Token tracking — each of the two LLM calls contributes to the token usage log in Cosmos DB, giving you per-call cost attribution by APIM subscription key. The Cosmos DB ai-usage-container in the hub captures a structured document for each LLM call, including the model version, token counts, gateway region, request IP, APIM subscription name, backend routing, and timestamp. In production, the productName field maps to the APIM subscription key. Aggregating documents by this field gives you direct FinOps reporting per AI initiative.

Azure Cosmos DB Data Explorer showing a usage event document in the ai-usage-container of the Citadel hub, with fields including model gpt-4o-2024-11-20, promptTokens 17, responseTokens 53, totalTokens 70, gatewayRegion Sweden Central, productName Portal-Admin, and timestamp 6/24/2026, confirming token tracking and cost attribution via the Microsoft Foundry Citadel APIM governance hub. — The Citadel hub Cosmos DB ai-usage-container showing a usage document captured from the tool-calling agent run model gpt-4o-2024-11-20, 70 total tokens, gateway region Sweden Central, routed via apim-wpvlimv4ngkns. Every LLM call through APIM generates a document like this, which serves as the cost attribution and audit trail for enterprise AI governance.

Latency observability — Application Insights captures the duration of every call, making it easy to identify slow tool calls or model latency spikes.

Audit trail — every request is logged with timestamp, operation name, response code, and duration. For a healthcare or financial services context, this is your compliance evidence.

What’s Next

This post wires a tool-calling agent to the Citadel hub using the standard OpenAI SDK. The natural next steps:

Azure AI Foundry Agent Service routing — as the SDK matures, the azure-ai-projects client will likely gain proper APIM gateway support. Watch the azure-ai-agents release notes for updates on connection-based routing.

Conversation persistence — store conversation history in the Cosmos DB conversations container already deployed in the spoke. The App Config key CONVERSATIONS_DATABASE_CONTAINER points to it.

Network isolation — re-enable networkIsolation=true in the spoke parameters to route all traffic through private endpoints.

Multiple tools — extend the agent with additional function tools (document lookup, product catalog, claims system) using the same pattern. Each tool call flows through APIM and is governed identically.

Conclusion

Connecting a real tool-calling agent to the Microsoft Foundry Citadel Platform on Azure requires three components: the standard OpenAI SDK configured to point to the APIM gateway, a function tool with a JSON schema definition, and an explicit tool-call-handling loop. Everything else, governance, content safety, token tracking, and cost attribution, is handled by the Citadel hub automatically.

The path to get here involved navigating several SDK beta rough edges and discovering that the AI Foundry Agent Service bypasses APIM in its current preview form. These are expected friction points with a platform in active development. The governance architecture underneath is sound, the APIM policies work, and the Application Insights telemetry confirms it.

Two LLM calls. Both governed. Both visible. That is what the Citadel hub delivers.

Cloud Perspectives

Steef-Jan Wiggers

Tag Archives: AI Architecture