Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
- Foundry Local is available in preview. Public preview releases provide early access to features that are in active deployment.
- Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
Foundry Local integrates with various inferencing SDKs - such as OpenAI, Azure OpenAI, Langchain, etc. This guide shows you how to connect your applications to locally running AI models using popular SDKs.
Prerequisites
- Foundry Local installed. See the Get started with Foundry Local article for installation instructions.
Install pip packages
Install the following Python packages:
pip install openai
pip install foundry-local-sdk
Tip
We recommend using a virtual environment to avoid package conflicts. You can create a virtual environment using either venv
or conda
.
Use OpenAI SDK with Foundry Local
The following example demonstrates how to use the OpenAI SDK with Foundry Local. The code initializes the Foundry Local service, loads a model, and generates a response using the OpenAI SDK.
Copy-and-paste the following code into a Python file named app.py
:
import openai
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "phi-3.5-mini"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code uses the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
base_url=manager.endpoint,
api_key=manager.api_key # API key is not required for local usage
)
# Set the model to use and generate a response
response = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "What is the golden ratio?"}]
)
print(response.choices[0].message.content)
Run the code using the following command:
python app.py
Streaming Response
If you want to receive a streaming response, you can modify the code as follows:
import openai
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "phi-3.5-mini"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code us es the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
base_url=manager.endpoint,
api_key=manager.api_key # API key is not required for local usage
)
# Set the model to use and generate a streaming response
stream = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "What is the golden ratio?"}],
stream=True
)
# Print the streaming response
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
You can run the code using the same command as before:
python app.py
Use requests
with Foundry Local
# Install with: pip install requests
import requests
import json
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "phi-3.5-mini"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
url = manager.endpoint + "/chat/completions"
payload = {
"model": manager.get_model_info(alias).id,
"messages": [
{"role": "user", "content": "What is the golden ratio?"}
]
}
headers = {
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()["choices"][0]["message"]["content"])
Install Node.js packages
You need to install the following Node.js packages:
npm install openai
npm install foundry-local-sdk
The Foundry Local SDK allows you to manage the Foundry Local service and models.
Use OpenAI SDK with Foundry Local
The following example demonstrates how to use the OpenAI SDK with Foundry Local. The code initializes the Foundry Local service, loads a model, and generates a response using the OpenAI SDK.
Copy-and-paste the following code into a JavaScript file named app.js
:
import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)
const openai = new OpenAI({
baseURL: foundryLocalManager.endpoint,
apiKey: foundryLocalManager.apiKey,
});
async function generateText() {
const response = await openai.chat.completions.create({
model: modelInfo.id,
messages: [
{
role: "user",
content: "What is the golden ratio?",
},
],
});
console.log(response.choices[0].message.content);
}
generateText();
Run the code using the following command:
node app.js
Streaming Responses
If you want to receive streaming responses, you can modify the code as follows:
import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)
const openai = new OpenAI({
baseURL: foundryLocalManager.endpoint,
apiKey: foundryLocalManager.apiKey,
});
async function streamCompletion() {
const stream = await openai.chat.completions.create({
model: modelInfo.id,
messages: [{ role: "user", content: "What is the golden ratio?" }],
stream: true,
});
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
}
streamCompletion();
Run the code using the following command:
node app.js
Use Fetch API with Foundry Local
If you prefer to use an HTTP client like fetch
, you can do so as follows:
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)
async function queryModel() {
const response = await fetch(foundryLocalManager.endpoint + "/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
model: modelInfo.id,
messages: [
{ role: "user", content: "What is the golden ratio?" },
],
}),
});
const data = await response.json();
console.log(data.choices[0].message.content);
}
queryModel();
Streaming Responses
If you want to receive streaming responses using the Fetch API, you can modify the code as follows:
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)
async function streamWithFetch() {
const response = await fetch(foundryLocalManager.endpoint + "/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
Accept: "text/event-stream",
},
body: JSON.stringify({
model: modelInfo.id,
messages: [{ role: "user", content: "what is the golden ratio?" }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n").filter((line) => line.trim() !== "");
for (const line of lines) {
if (line.startsWith("data: ")) {
const data = line.substring(6);
if (data === "[DONE]") continue;
try {
const json = JSON.parse(data);
const content = json.choices[0]?.delta?.content || "";
if (content) {
// Print to console without line breaks, similar to process.stdout.write
process.stdout.write(content);
}
} catch (e) {
console.error("Error parsing JSON:", e);
}
}
}
}
}
// Call the function to start streaming
streamWithFetch();
Create project
Create a new C# project and navigate into it:
dotnet new console -n hello-foundry-local
cd hello-foundry-local
Install NuGet Packages
Install the following NuGet packages into your project folder:
dotnet add package Microsoft.AI.Foundry.Local --version 0.1.0
dotnet add package OpenAI --version 2.2.0-beta.4
Use OpenAI SDK with Foundry Local
The following example demonstrates how to use the OpenAI SDK with Foundry Local. The code initializes the Foundry Local service, loads a model, and generates a response using the OpenAI SDK.
Copy-and-paste the following code into a C# file named Program.cs
:
using Microsoft.AI.Foundry.Local;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel;
using System.Diagnostics.Metrics;
var alias = "phi-3.5-mini";
var manager = await FoundryLocalManager.StartModelAsync(aliasOrModelId: alias);
var model = await manager.GetModelInfoAsync(aliasOrModelId: alias);
ApiKeyCredential key = new ApiKeyCredential(manager.ApiKey);
OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions
{
Endpoint = manager.Endpoint
});
var chatClient = client.GetChatClient(model?.ModelId);
var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue'");
Console.Write($"[ASSISTANT]: ");
foreach (var completionUpdate in completionUpdates)
{
if (completionUpdate.ContentUpdate.Count > 0)
{
Console.Write(completionUpdate.ContentUpdate[0].Text);
}
}
Run the code using the following command:
dotnet run
Create project
Create a new Rust project and navigate into it:
cargo new hello-foundry-local
cd hello-foundry-local
Install crates
Install the following Rust crates using Cargo:
cargo add foundry-local anyhow env_logger serde_json
cargo add reqwest --features json
cargo add tokio --features full
Update the main.rs
file
The following example demonstrates how to inference using a request to the Foundry Local service. The code initializes the Foundry Local service, loads a model, and generates a response using the reqwest
library.
Copy-and-paste the following code into the Rust file named main.rs
:
use foundry_local::FoundryLocalManager;
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()> {
// Create a FoundryLocalManager instance with default options
let mut manager = FoundryLocalManager::builder()
.alias_or_model_id("qwen2.5-0.5b") // Specify the model to use
.bootstrap(true) // Start the service if not running
.build()
.await?;
// Use the OpenAI compatible API to interact with the model
let client = reqwest::Client::new();
let endpoint = manager.endpoint()?;
let response = client.post(format!("{}/chat/completions", endpoint))
.header("Content-Type", "application/json")
.header("Authorization", format!("Bearer {}", manager.api_key()))
.json(&serde_json::json!({
"model": manager.get_model_info("qwen2.5-0.5b", true).await?.id,
"messages": [{"role": "user", "content": "What is the golden ratio?"}],
}))
.send()
.await?;
let result = response.json::<serde_json::Value>().await?;
println!("{}", result["choices"][0]["message"]["content"]);
Ok(())
}
Run the code using the following command:
cargo run