Integrare gli SDK di inferenza con Foundry Local

2025-06-24

Importante

Foundry Local è disponibile in anteprima. Le versioni di anteprima pubblica consentono l'accesso anticipato alle funzionalità in fase di distribuzione attiva.
Funzionalità, approcci e processi possono modificare o avere funzionalità limitate, prima della disponibilità generale (GA).

Foundry Local si integra con vari SDK di inferenza, ad esempio OpenAI, Azure OpenAI, Langchain e così via. Questa guida illustra come connettere le applicazioni ai modelli di intelligenza artificiale in esecuzione in locale usando gli SDK più diffusi.

Prerequisiti

Foundry Local installato. Per istruzioni sull'installazione, vedere l'articolo Introduzione a Foundry Local .

Installare pacchetti pip

Installare i pacchetti Python seguenti:

pip install openai
pip install foundry-local-sdk

Suggerimento

È consigliabile usare un ambiente virtuale per evitare conflitti di pacchetti. È possibile creare un ambiente virtuale usando venv o conda.

Usare OpenAI SDK con Foundry Local

L'esempio seguente illustra come usare OpenAI SDK con Foundry Local. Il codice inizializza il servizio locale Foundry, carica un modello e genera una risposta usando OpenAI SDK.

Copiare e incollare il codice seguente in un file Python denominato app.py:

import openai
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded 
# to your end-user's device. 
alias = "phi-3.5-mini"

# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code uses the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
    base_url=manager.endpoint,
    api_key=manager.api_key  # API key is not required for local usage
)
# Set the model to use and generate a response
response = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[{"role": "user", "content": "What is the golden ratio?"}]
)
print(response.choices[0].message.content)

Eseguire il codice usando il comando seguente:

python app.py

Streaming della risposta

Se si vuole ricevere una risposta di streaming, è possibile modificare il codice nel modo seguente:

import openai
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded 
# to your end-user's device.
alias = "phi-3.5-mini"

# Create a FoundryLocalManager instance. This will start the Foundry 
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)

# The remaining code us es the OpenAI Python SDK to interact with the local model.

# Configure the client to use the local Foundry service
client = openai.OpenAI(
    base_url=manager.endpoint,
    api_key=manager.api_key  # API key is not required for local usage
)

# Set the model to use and generate a streaming response
stream = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[{"role": "user", "content": "What is the golden ratio?"}],
    stream=True
)

# Print the streaming response
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

È possibile eseguire il codice usando lo stesso comando di prima:

python app.py

Usare `requests` con Foundry Local

# Install with: pip install requests
import requests
import json
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded 
# to your end-user's device. 
alias = "phi-3.5-mini"

# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)

url = manager.endpoint + "/chat/completions"

payload = {
    "model": manager.get_model_info(alias).id,
    "messages": [
        {"role": "user", "content": "What is the golden ratio?"}
    ]
}

headers = {
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()["choices"][0]["message"]["content"])

Installare pacchetti Node.js

È necessario installare i pacchetti di Node.js seguenti:

npm install openai
npm install foundry-local-sdk

Foundry Local SDK consente di gestire il servizio locale Foundry e i modelli.

Usare OpenAI SDK con Foundry Local

L'esempio seguente illustra come usare OpenAI SDK con Foundry Local. Il codice inizializza il servizio locale Foundry, carica un modello e genera una risposta usando OpenAI SDK.

Copiare e incollare il codice seguente in un file JavaScript denominato app.js:

import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded 
// to your end-user's device.
// TIP: You can find a list of available models by running the 
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";

// Create a FoundryLocalManager instance. This will start the Foundry 
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()

// Initialize the manager with a model. This will download the model 
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)

const openai = new OpenAI({
  baseURL: foundryLocalManager.endpoint,
  apiKey: foundryLocalManager.apiKey,
});

async function generateText() {
  const response = await openai.chat.completions.create({
    model: modelInfo.id,
    messages: [
      {
        role: "user",
        content: "What is the golden ratio?",
      },
    ],
  });

  console.log(response.choices[0].message.content);
}

generateText();

Eseguire il codice usando il comando seguente:

node app.js

Risposte in streaming

Se si vogliono ricevere risposte in streaming, è possibile modificare il codice nel modo seguente:

import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded 
// to your end-user's device.
// TIP: You can find a list of available models by running the 
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";

// Create a FoundryLocalManager instance. This will start the Foundry 
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()

// Initialize the manager with a model. This will download the model 
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)

const openai = new OpenAI({
  baseURL: foundryLocalManager.endpoint,
  apiKey: foundryLocalManager.apiKey,
});

async function streamCompletion() {
    const stream = await openai.chat.completions.create({
      model: modelInfo.id,
      messages: [{ role: "user", content: "What is the golden ratio?" }],
      stream: true,
    });
  
    for await (const chunk of stream) {
      if (chunk.choices[0]?.delta?.content) {
        process.stdout.write(chunk.choices[0].delta.content);
      }
    }
}
  
streamCompletion();

Eseguire il codice usando il comando seguente:

node app.js

Usare l'API Fetch con Foundry Local

Se si preferisce usare un client HTTP come fetch, è possibile farlo come segue:

import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded 
// to your end-user's device.
// TIP: You can find a list of available models by running the 
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";

// Create a FoundryLocalManager instance. This will start the Foundry 
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()

// Initialize the manager with a model. This will download the model 
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)

async function queryModel() {
    const response = await fetch(foundryLocalManager.endpoint + "/chat/completions", {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
        },
        body: JSON.stringify({
            model: modelInfo.id,
            messages: [
                { role: "user", content: "What is the golden ratio?" },
            ],
        }),
    });

    const data = await response.json();
    console.log(data.choices[0].message.content);
}

queryModel();

Risposte in streaming

Se si vogliono ricevere risposte in streaming usando l'API Fetch, è possibile modificare il codice nel modo seguente:

import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded 
// to your end-user's device.
// TIP: You can find a list of available models by running the 
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";

// Create a FoundryLocalManager instance. This will start the Foundry 
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()

// Initialize the manager with a model. This will download the model 
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)

async function streamWithFetch() {
    const response = await fetch(foundryLocalManager.endpoint + "/chat/completions", {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            Accept: "text/event-stream",
        },
        body: JSON.stringify({
            model: modelInfo.id,
            messages: [{ role: "user", content: "what is the golden ratio?" }],
            stream: true,
        }),
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split("\n").filter((line) => line.trim() !== "");

        for (const line of lines) {
            if (line.startsWith("data: ")) {
                const data = line.substring(6);
                if (data === "[DONE]") continue;

                try {
                    const json = JSON.parse(data);
                    const content = json.choices[0]?.delta?.content || "";
                    if (content) {
                        // Print to console without line breaks, similar to process.stdout.write
                        process.stdout.write(content);
                    }
                } catch (e) {
                    console.error("Error parsing JSON:", e);
                }
            }
        }
    }
}

// Call the function to start streaming
streamWithFetch();

Creare un progetto

Creare un nuovo progetto C# e accederne:

dotnet new console -n hello-foundry-local
cd hello-foundry-local

Installare pacchetti NuGet

Installare i pacchetti NuGet seguenti nella cartella del progetto:

dotnet add package Microsoft.AI.Foundry.Local --version 0.1.0
dotnet add package OpenAI --version 2.2.0-beta.4

Usare OpenAI SDK con Foundry Local

L'esempio seguente illustra come usare OpenAI SDK con Foundry Local. Il codice inizializza il servizio locale Foundry, carica un modello e genera una risposta usando OpenAI SDK.

Copiare e incollare il codice seguente in un file C# denominato Program.cs:

using Microsoft.AI.Foundry.Local;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel;
using System.Diagnostics.Metrics;

var alias = "phi-3.5-mini";

var manager = await FoundryLocalManager.StartModelAsync(aliasOrModelId: alias);

var model = await manager.GetModelInfoAsync(aliasOrModelId: alias);
ApiKeyCredential key = new ApiKeyCredential(manager.ApiKey);
OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions
{
    Endpoint = manager.Endpoint
});

var chatClient = client.GetChatClient(model?.ModelId);

var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue'");

Console.Write($"[ASSISTANT]: ");
foreach (var completionUpdate in completionUpdates)
{
    if (completionUpdate.ContentUpdate.Count > 0)
    {
        Console.Write(completionUpdate.ContentUpdate[0].Text);
    }
}

Eseguire il codice usando il comando seguente:

dotnet run

Creare un progetto

Creare un nuovo progetto Rust e accederne:

cargo new hello-foundry-local
cd hello-foundry-local

Installare crate

Installare i seguenti crate Rust usando Cargo:

cargo add foundry-local anyhow env_logger serde_json
cargo add reqwest --features json
cargo add tokio --features full

Aggiornare il `main.rs` file

Nell'esempio seguente viene illustrato come effettuare un'inferenza tramite una richiesta al servizio Foundry Locale. Il codice inizializza il servizio locale Foundry, carica un modello e genera una risposta usando la reqwest libreria.

Copiare e incollare il codice seguente nel file Rust denominato main.rs:

use foundry_local::FoundryLocalManager;
use anyhow::Result;

#[tokio::main]
async fn main() -> Result<()> {
    // Create a FoundryLocalManager instance with default options
    let mut manager = FoundryLocalManager::builder()
        .alias_or_model_id("qwen2.5-0.5b") // Specify the model to use   
        .bootstrap(true) // Start the service if not running
        .build()
        .await?;
    
    // Use the OpenAI compatible API to interact with the model
    let client = reqwest::Client::new();
    let endpoint = manager.endpoint()?;
    let response = client.post(format!("{}/chat/completions", endpoint))
        .header("Content-Type", "application/json")
        .header("Authorization", format!("Bearer {}", manager.api_key()))
        .json(&serde_json::json!({
            "model": manager.get_model_info("qwen2.5-0.5b", true).await?.id,
            "messages": [{"role": "user", "content": "What is the golden ratio?"}],
        }))
        .send()
        .await?;

    let result = response.json::<serde_json::Value>().await?;
    println!("{}", result["choices"][0]["message"]["content"]);
    
    Ok(())
}

Eseguire il codice usando il comando seguente:

cargo run

Condividi tramite

Integrare gli SDK di inferenza con Foundry Local

Prerequisiti

Installare pacchetti pip

Usare OpenAI SDK con Foundry Local

Streaming della risposta

Usare requests con Foundry Local

Installare pacchetti Node.js

Usare OpenAI SDK con Foundry Local

Risposte in streaming

Usare l'API Fetch con Foundry Local

Risposte in streaming

Creare un progetto

Installare pacchetti NuGet

Usare OpenAI SDK con Foundry Local

Creare un progetto

Installare crate

Aggiornare il main.rs file

Passaggi successivi

Commenti e suggerimenti

Risorse aggiuntive

Usare `requests` con Foundry Local

Aggiornare il `main.rs` file