Chỉnh sửa

Chia sẻ qua


Foundry Local SDK reference

The Foundry Local SDK enables you to ship AI features in your applications that are capable of using local AI models through a simple and intuitive API. The SDK abstracts away the complexities of managing AI models and provides a seamless experience for integrating local AI capabilities into your applications. This reference documents SDK implementations for C#, JavaScript, Python, and Rust.

The SDK doesn't require the Foundry Local CLI to be installed on the end users machine, allowing you to ship your applications without extra setup steps for your users - your applications is self-contained. The extra benefits of the Foundry Local SDK include:

  • Hardware detection and optimization: Automatic capability assessment for GPU, NPU, and CPU.
  • Execution provider management (Windows): Automatic download and registration of appropriate ONNX Runtime execution providers (CUDA, Vitis, QNN, OpenVINO, TensorRT) based on device capabilities.
  • Metal support via WebGpu (macOS): Native support for running models on Apple Silicon with optimized performance.
  • Model acquisition: Seamless download from Foundry Model Catalog with versioning, updates, and automatically hardware-optimized model selection with fallback support.
  • Efficient runtime: Adds approximately 20 MB to app size, runs on devices from mobile phones to desktops.
  • OpenAI API compatibility: Easy integration with OpenAI models and tools.
  • Optional REST server: Run Foundry Local as a local service accessible by other applications.

C# SDK Reference

Install packages

If you're developing or shipping on Windows, select the Windows tab. The Windows package integrates with the Windows ML runtime — it provides the same API surface area with a wider breadth of hardware acceleration.

dotnet add package Microsoft.AI.Foundry.Local.WinML
dotnet add package OpenAI

The C# samples in the GitHub repository are preconfigured projects. If you're building from scratch, you should read the Foundry Local SDK reference for more details on how to set up your C# project with Foundry Local.

Project configuration

The sample repositories include a .csproj file that handles platform detection automatically. If you're building a project from scratch, use this configuration as a reference:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <!-- Windows: target Windows SDK for WinML hardware acceleration -->
  <PropertyGroup Condition="$([MSBuild]::IsOSPlatform('Windows'))">
    <TargetFramework>net9.0-windows10.0.26100</TargetFramework>
    <WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained>
    <Platforms>ARM64;x64</Platforms>
    <WindowsPackageType>None</WindowsPackageType>
    <EnableCoreMrtTooling>false</EnableCoreMrtTooling>
  </PropertyGroup>

  <!-- Non-Windows: standard .NET -->
  <PropertyGroup Condition="!$([MSBuild]::IsOSPlatform('Windows'))">
    <TargetFramework>net9.0</TargetFramework>
  </PropertyGroup>

  <PropertyGroup Condition="'$(RuntimeIdentifier)'==''">
    <RuntimeIdentifier>$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
  </PropertyGroup>

  <!-- Windows: WinML for hardware acceleration -->
  <ItemGroup Condition="$([MSBuild]::IsOSPlatform('Windows'))">
    <PackageReference Include="Microsoft.AI.Foundry.Local.WinML" />
  </ItemGroup>

  <!-- Non-Windows: standard SDK -->
  <ItemGroup Condition="!$([MSBuild]::IsOSPlatform('Windows'))">
    <PackageReference Include="Microsoft.AI.Foundry.Local" />
  </ItemGroup>

  <!-- Linux GPU support -->
  <ItemGroup Condition="'$(RuntimeIdentifier)' == 'linux-x64'">
    <PackageReference Include="Microsoft.ML.OnnxRuntime.Gpu" />
    <PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" />
  </ItemGroup>

  <!-- Shared utilities -->
  <ItemGroup>
    <Compile Include="../Shared/*.cs" />
  </ItemGroup>

</Project>

The following table explains the key project settings:

Setting Description
TargetFramework On Windows, targets net9.0-windows10.0.26100 for WinML hardware acceleration. On other platforms, targets net9.0.
WindowsAppSDKSelfContained Set to false to use the system-installed Windows App SDK rather than bundling it.
WindowsPackageType Set to None to build as an unpackaged desktop app (no MSIX packaging).
EnableCoreMrtTooling Set to false to disable MRT Core resource tooling, which isn't needed for console apps.
RuntimeIdentifier Defaults to the current SDK's runtime identifier, ensuring the correct platform binaries are selected.
Microsoft.AI.Foundry.Local.WinML Windows-only package that uses WinML for hardware acceleration and automatic execution provider management.
Microsoft.AI.Foundry.Local Cross-platform package for macOS, Linux, and Windows without WinML.
Microsoft.ML.OnnxRuntime.Gpu / OnnxRuntimeGenAI.Cuda Linux GPU support packages for CUDA-enabled hardware.

Quickstart

Use this snippet to verify that the SDK can initialize and access the local model catalog.

using Microsoft.AI.Foundry.Local;
using Microsoft.Extensions.Logging;
using System.Linq;

var config = new Configuration
{
  AppName = "app-name",
  LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information,
};

using var loggerFactory = LoggerFactory.Create(builder =>
{
  builder.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Information);
});
var logger = loggerFactory.CreateLogger<Program>();

await FoundryLocalManager.CreateAsync(config, logger);
var manager = FoundryLocalManager.Instance;

var catalog = await manager.GetCatalogAsync();
var models = await catalog.ListModelsAsync();

Console.WriteLine($"Models available: {models.Count()}");

This example prints the number of models available for your hardware.

Samples

API reference

Native Audio Transcription API

The C# SDK includes a native audio client for transcribing audio files on-device using Whisper models. This runs inference in-process without needing the REST web server.

Get an audio client

After loading a Whisper model, get an audio client:

var audioClient = await model.GetAudioClientAsync();

Audio transcription methods

Method Signature Description
TranscribeAudioStreamingAsync() (string audioFilePath, CancellationToken ct) => IAsyncEnumerable<TranscriptionChunk> Streams transcription results chunk by chunk. Each chunk has a Text property.

AudioClient settings

Property Type Description
Language string ISO 639-1 language code (for example, "en"). Improves accuracy.
Temperature float Sampling temperature (0.0–1.0). Lower values are more deterministic.

Example

var audioClient = await model.GetAudioClientAsync();
audioClient.Settings.Language = "en";
audioClient.Settings.Temperature = 0.0f;

await foreach (var chunk in audioClient.TranscribeAudioStreamingAsync(
    "recording.mp3", CancellationToken.None))
{
    Console.Write(chunk.Text);
}

References:

JavaScript SDK Reference

Install packages

If you're developing or shipping on Windows, select the Windows tab. The Windows package integrates with the Windows ML runtime — it provides the same API surface area with a wider breadth of hardware acceleration.

npm install foundry-local-sdk-winml openai

Quickstart

Use this snippet to verify that the SDK can initialize and access the local model catalog.

import { FoundryLocalManager } from 'foundry-local-sdk';

console.log('Initializing Foundry Local SDK...');

const manager = FoundryLocalManager.create({
    appName: 'foundry_local_samples',
    logLevel: 'info'
});
console.log('✓ SDK initialized successfully');

// Explore available models
console.log('\nFetching available models...');
const catalog = manager.catalog;
const models = await catalog.getModels();

console.log(`Found ${models.length} models:`);
for (const model of models) {
    console.log(`  - ${model.alias}`);
}

This example outputs the list of available models for your hardware.

Samples

API reference

References

Python SDK Reference

Install packages

If you're developing or shipping on Windows, select the Windows tab. The Windows package integrates with the Windows ML runtime — it provides the same API surface area with a wider breadth of hardware acceleration.

pip install foundry-local-sdk-winml openai

Quickstart

Use this snippet to verify that the SDK can initialize and access the local model catalog.

import asyncio
from foundry_local_sdk import Configuration, FoundryLocalManager


async def main():
    config = Configuration(app_name="app-name")
    FoundryLocalManager.initialize(config)
    manager = FoundryLocalManager.instance

    models = manager.catalog.list_models()
    print(f"Models available: {len(models)}")


if __name__ == "__main__":
    asyncio.run(main())

This example prints the number of models available for your hardware.

Samples

Configuration

The Configuration class allows you to customize the SDK behavior:

from foundry_local_sdk import Configuration

config = Configuration(
    app_name="app-name",
    log_level="info",
    model_cache_dir="./foundry_local_data/model_cache",
    web={"urls": "http://127.0.0.1:55588"},
)
Parameter Type Description
app_name str Name of your application.
log_level str Logging level (for example, "info", "debug").
model_cache_dir str Directory for cached models.
web dict Web service configuration with urls key.

Core API

Method Description
FoundryLocalManager.initialize(config) Initialize the singleton manager with a Configuration.
FoundryLocalManager.instance Access the initialized manager instance.
manager.catalog.list_models() List all available models in the catalog.
manager.catalog.get_model(alias) Get a model by alias.
manager.catalog.get_cached_models() List models in the local cache.
manager.catalog.get_loaded_models() List models currently loaded.
model.download(progress_callback) Download the model (skips if cached).
model.load() Load the model for inference.
model.unload() Unload the model.
model.is_cached Check if the model is cached locally.
model.is_loaded Check if the model is loaded.

Native Chat Completions API

After loading a model, get a chat client:

client = model.get_chat_client()
Method Description
client.complete_chat(messages) Generate a complete chat response.
client.complete_streaming_chat(messages) Stream chat response chunks.

Native Audio Transcription API

After loading a Whisper model, get an audio client:

audio_client = model.get_audio_client()
Method Description
audio_client.transcribe(file_path) Transcribe an audio file. Returns an object with a text property.

References:

Rust SDK Reference

Install packages

If you're developing or shipping on Windows, select the Windows tab. The Windows package integrates with the Windows ML runtime — it provides the same API surface area with a wider breadth of hardware acceleration.

cargo add foundry-local-sdk --features winml
cargo add tokio --features full
cargo add tokio-stream anyhow

Quickstart

Use this snippet to verify that the SDK can initialize and access the local model catalog.

use foundry_local_sdk::{FoundryLocalConfig, FoundryLocalManager};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let manager = FoundryLocalManager::create(FoundryLocalConfig::new("app-name"))?;

    let models = manager.catalog().get_models().await?;
    println!("Models available: {}", models.len());

    Ok(())
}

This example prints the number of models available for your hardware.

Samples

Configuration

The FoundryLocalConfig struct allows you to customize the SDK behavior:

use foundry_local_sdk::FoundryLocalConfig;

let config = FoundryLocalConfig::new("app-name")
    .with_log_level("info")
    .with_model_cache_dir("./foundry_local_data/model_cache")
    .with_web_urls("http://127.0.0.1:55588");

Core API

Method Description
FoundryLocalManager::create(config) Create a new manager with a FoundryLocalConfig.
manager.catalog().get_models().await List all available models.
manager.catalog().get_model(alias).await Get a model by alias.
manager.catalog().get_cached_models().await List models in the local cache.
manager.catalog().get_loaded_models().await List models currently loaded.
model.download(callback).await Download the model (skips if cached).
model.load().await Load the model for inference.
model.unload().await Unload the model.

Native Chat Completions API

After loading a model, create a chat client with optional settings:

let client = model.create_chat_client()
    .temperature(0.7)
    .max_tokens(256);
Method Description
client.complete_chat(&messages, tools).await Generate a complete chat response.
client.complete_streaming_chat(&messages, tools).await Stream chat response chunks.

Message types: ChatCompletionRequestSystemMessage, ChatCompletionRequestUserMessage, ChatCompletionRequestMessage.

Native Audio Transcription API

After loading a Whisper model, create an audio client:

let audio_client = model.create_audio_client();
Method Description
audio_client.transcribe(file_path).await Transcribe an audio file. Returns an object with a text field.

References: