Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
SQL Model Context Protocol (MCP) Server is available in Data API builder version 1.7. For the latest capabilities and bug fixes, use the 2.0 preview release.
SQL Model Context Protocol (MCP) Server works with any MCP-compatible client, not just cloud-hosted AI services. If your environment restricts cloud large language model (LLM) access—common in healthcare, defense, finance, energy, and maritime industries—you can connect a local model served through Ollama or similar tools. This guide covers setup, field metadata configuration, and prompt patterns that make small local models reliable.
Prerequisites
- Data API builder CLI installed and configured with at least one entity. Install the CLI.
- Ollama with a model that supports tool calling (for example,
qwen3:8b,llama3.1:8b). - Python 3.10+ with the
mcpandollamapackages. - A running SQL Server instance with data.
Step 1: Configure field metadata
Field metadata is the most important configuration step for local model accuracy. Without field names and descriptions, agents see only entity names and guess column names incorrectly.
Warning
Skipping this step produces an MCP server that technically works but is functionally unusable by any model that reads tool responses. The model has no information about your columns.
Add your entity with a description, then add field descriptions that include valid values for constrained columns:
dab add ServerInventory \
--source dbo.ServerInventory \
--permissions "anonymous:read" \
--description "SQL Server instance inventory with version, environment, and sizing data"
dab update ServerInventory \
--fields.name InstanceName --fields.primary-key true \
--fields.description "SQL Server instance name (e.g., YOURSERVER01)"
dab update ServerInventory \
--fields.name Environment \
--fields.description "Deployment environment. Valid values: Prod, Dev, Test, UAT"
For the complete CLI reference and best practices—including constrained values, parameter descriptions, and scripting patterns—see Add descriptions to entities.
Note
The dab update CLI treats commas as argument separators. If your description contains commas, edit dab-config.json directly instead.
Step 2: Start SQL MCP Server
dab start
SQL MCP Server listens on http://localhost:5000/mcp using streamable HTTP transport by default. Any client that implements the MCP protocol can connect to this endpoint.
Step 3: Connect your local model
Build an MCP client that connects your Ollama model to SQL MCP Server. The following Python example uses the MCP Python SDK and the ollama package.
Install dependencies
pip install mcp ollama
Minimal Python harness
import asyncio
import json
from mcp import ClientSession
from mcp.client.streamable_http import streamable_http_client
import ollama
MCP_URL = "http://localhost:5000/mcp"
MODEL = "qwen3:8b"
async def get_schema(session: ClientSession) -> str:
"""Call describe_entities and format results for the system prompt."""
result = await session.call_tool("describe_entities", arguments={})
entities = json.loads(result.content[0].text)
lines = []
for entity in entities.get("entities", []):
fields = ", ".join(
f"{f['name']} ({f.get('description', 'no description')})"
for f in entity.get("fields", [])
)
lines.append(f"- {entity['name']}: {entity.get('description', '')}")
if fields:
lines.append(f" Fields: {fields}")
return "\n".join(lines)
async def run(user_question: str):
async with streamable_http_client(MCP_URL) as (read, write, _):
async with ClientSession(read, write) as session:
await session.initialize()
# Preinject schema into the system prompt
schema_text = await get_schema(session)
system_prompt = f"""You query a SQL database through MCP tools.
Available entities:
{schema_text}
Rules:
- Use the exact field names shown above.
- Answer count questions with the count only.
- Do not produce summaries unless asked.
- Do not invent example data. Only return data from tool responses.
- If no results, say "No results found" and stop.
"""
# Get available tools for Ollama
tools_result = await session.list_tools()
ollama_tools = [
{
"type": "function",
"function": {
"name": t.name,
"description": t.description or "",
"parameters": t.inputSchema,
},
}
for t in tools_result.tools
]
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_question},
]
# Chat loop: let the model call tools until it produces a final answer
while True:
response = ollama.chat(
model=MODEL, messages=messages, tools=ollama_tools
)
msg = response["message"]
messages.append(msg)
if not msg.get("tool_calls"):
print(msg["content"])
break
for tc in msg["tool_calls"]:
result = await session.call_tool(
tc["function"]["name"],
arguments=tc["function"]["arguments"],
)
messages.append(
{
"role": "tool",
"content": result.content[0].text,
}
)
asyncio.run(run("How many SQL 2019 servers are in production?"))
This harness handles the full cycle: schema preinjection, tool discovery, multi-turn tool calling, and final answer extraction. Adjust MODEL and MCP_URL for your environment.
Preinject schema at startup
Small local models (under 14B parameters) produce more reliable tool calls when schema metadata is in the system prompt before the conversation begins. Instead of relying on the model to call describe_entities on its own during conversation, call it at harness startup and inject the result.
Why preinjection matters
| Approach | Behavior with small models |
|---|---|
| Dynamic discovery | Model must decide to call describe_entities first, then interpret results, then call the right tool with correct field names. Multiple points of failure. |
| Preinjection | Model sees entity names, field names, and descriptions immediately. Correct tool calls on the first attempt. |
The harness example in the previous section demonstrates this pattern. The get_schema() function calls describe_entities once at startup and formats the result into the system prompt.
Tip
Larger cloud models (GPT-4o, Claude) typically discover schema during conversation without preinjection. This pattern is most valuable for models under 14B parameters.
Constrain model responses
A model can make a correct tool call, retrieve the right data, and still produce a wrong answer. For example, a model asked "how many production servers?" might retrieve 16 rows correctly, then respond with a 40-line executive summary containing hallucinated examples instead of the number 16.
Add explicit negative rules to your system prompt:
Rules:
- Answer count questions with the count only.
- Do not produce summaries unless the user asks for one.
- Do not invent example data. Only return data from tool responses.
- If a tool returns no results, say "No results found" and stop.
Tool-calling fidelity and answer discipline are different problems. DAB ensures accurate data retrieval through the tool layer. Your prompt harness controls how the model presents results.
Considerations
| Topic | Details |
|---|---|
| Hardware | Tool calling works on modest hardware. An 8B-parameter model on a consumer Nvidia GPU (8 GB video RAM) produces useful results. Expect latency in the tens of seconds per question, which suits batch workloads. |
| Batch vs. interactive | Small models are well-suited for batch processing (performance reports, inventory queries) where latency tolerance is higher. |
| Tool availability | aggregate_records is available in version 2.0 preview and later only. On version 1.7.x, count and aggregation queries force the model to read all matching rows. See tool availability by version. |
| Transport | Local models connect via streamable HTTP to /mcp. The standard input/output (stdio) transport is an alternative for single-process setups. |
| Authentication | For local development, use anonymous permissions. For production, configure authentication appropriate to your environment. |