Deploy an Agent with AgentRuntime
This guide shows how to deploy an AI agent and an MCP tool server with the Kagenti Operator, enroll them with AgentRuntime custom resources, and run an end-to-end query. The agent is driven by a model served from an AI InferenceService.
The scenario is a weather agent that answers questions like "What is the weather in New York?" by calling a weather MCP tool for live data and a large language model to compose the reply. The request flow is:
TOC
PrerequisitesFind your model endpoint and nameStep 1: Deploy the MCP tool serverStep 2: Deploy the agentStep 3: Check statusStep 4: Send a query (end-to-end test)Updating and deleting an AgentRuntimeClean upTroubleshootingPrerequisites
-
Kagenti Operator is installed and its
Kagentioperand is reconciled — see Installation. -
kubectlaccess to the target cluster. -
A demo namespace. This guide uses
team1: -
An InferenceService that serves an OpenAI-compatible chat API, reachable in-cluster. This guide uses an InferenceService named
qwen36-27b-gguf. Any chat model works; agent quality improves with a tool-calling-capable model.
Find your model endpoint and name
The agent connects to the model with three environment variables — LLM_API_BASE, LLM_API_KEY, and LLM_MODEL. Resolve them from your InferenceService:
For the qwen36-27b-gguf InferenceService deployed in namespace zgsu-ns1, that resolves to:
Substitute your own InferenceService name, namespace, and model id throughout this guide. If the model is a reasoning model, the first tokens of a response are reasoning content — keep the agent's token budget unconstrained (as below) so the final answer is not truncated.
Step 1: Deploy the MCP tool server
The MCP (Model Context Protocol) server provides the get_weather tool the agent calls. Deploy it, expose it with a Service, and enroll it as a tool-type AgentRuntime.
- The MCP server image, mirrored to
docker.io/alaudadockerhub. On an air-gapped cluster use the copy relocated into your platform registry. - The Service the agent will reach at
http://weather-tool-mcp.team1.svc.cluster.local:8000/mcp. The operator does not create this Service for you, so you define it explicitly and select the tool's pods. spec.type: toolenrolls the workload as an MCP tool; the operator applies thekagenti.io/type: toollabel.
Wait for the tool to be ready:
Step 2: Deploy the agent
The agent only needs a protocol.kagenti.io/a2a label on its Deployment — the controller applies kagenti.io/type, computes a config hash, and (when the identity stack is enabled) triggers sidecar injection. The protocol label also tells the AgentCard sync controller which protocol the agent speaks, enabling automatic discovery.
protocol.kagenti.io/a2a: ""marks the Deployment as an A2A agent. It enables automaticAgentCardcreation; aValidatingAdmissionPolicyforbids settingkagenti.io/typedirectly, so enrollment must go through theAgentRuntime.- The agent image, mirrored to
docker.io/alaudadockerhub. MCP_URL— the MCP tool endpoint from Step 1 (the Service name with the/mcppath).LLM_API_BASE— the OpenAI-compatible base URL of your InferenceService predictor (.../v1).LLM_MODEL— the model id served by the InferenceService.
The agent's Service is named after the Deployment (weather-agent); the AgentRuntime controller resolves it to fetch the Agent Card for discovery.
When the AgentRuntime is created, the controller will:
- Resolve
targetRefand verify the Deployment exists. - Apply
kagenti.io/type: agentandapp.kubernetes.io/managed-by: kagenti-operatorlabels. - Compute a config hash and set it as a
kagenti.io/config-hashannotation on the pod template, triggering a rolling update.
Step 3: Check status
SYNCED=True with the AGENT column populated (here Weather Assistant) confirms the sync controller fetched the agent's A2A card — dynamic discovery is working.
Step 4: Send a query (end-to-end test)
Send an A2A message/send request to the agent from a temporary in-cluster pod, using the agent's internal Service DNS name:
The agent calls the model, the model invokes the weather MCP tool, and the agent returns a completed A2A task:
The history array in the full response shows the tool-calling flow (assistant → tools → assistant), including the ToolMessage returned by the weather MCP server.
A reasoning model spends its first tokens "thinking" before the final answer, so the first request may take noticeably longer than a non-reasoning model. Subsequent requests are faster.
You can watch the interaction in the logs:
Updating and deleting an AgentRuntime
Platform configuration changes (cluster or namespace ConfigMaps) trigger a rolling update so pods pick up new settings. Editing the AgentRuntime spec itself does not force a restart — new values are applied at pod creation time.
Deleting the AgentRuntime performs a graceful cleanup: the controller removes the kagenti.io/type label and the kagenti.io/config-hash annotation (triggering a rolling update so any injected pods are replaced) and the app.kubernetes.io/managed-by label.