Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
- Foundry Local is available in preview. Public preview releases provide early access to features that are in active deployment.
- Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
The Foundry Local SDK simplifies AI model management in local environments by providing control plane operations separate from data plane inference code. This reference documents SDK implementations for Python, JavaScript, C#, and Rust.
Python SDK Reference
Installation
Install the Python package:
pip install foundry-local-sdk
FoundryLocalManager Class
The FoundryLocalManager class provides methods to manage models, cache, and the Foundry Local service.
Initialization
from foundry_local import FoundryLocalManager
# Initialize and optionally bootstrap with a model
manager = FoundryLocalManager(alias_or_model_id=None, bootstrap=True)
alias_or_model_id: (optional) Alias or Model ID to download and load at startup.bootstrap: (default True) If True, starts the service if not running and loads the model if provided.
A note on aliases
Many methods outlined in this reference have an alias_or_model_id parameter in the signature. You can pass into the method either an alias or model ID as a value. Using an alias will:
- Select the best model for the available hardware. For example, if a Nvidia CUDA GPU is available, Foundry Local selects the CUDA model. If a supported NPU is available, Foundry Local selects the NPU model.
- Allow you to use a shorter name without needing to remember the model ID.
Tip
We recommend passing into the alias_or_model_id parameter an alias because when you deploy your application, Foundry Local acquires the best model for the end user's machine at run-time.
Note
If you have an Intel NPU on Windows, ensure you have installed the Intel NPU driver for optimal NPU acceleration.
Service Management
| Method | Signature | Description |
|---|---|---|
is_service_running() |
() -> bool |
Checks if the Foundry Local service is running. |
start_service() |
() -> None |
Starts the Foundry Local service. |
service_uri |
@property -> str |
Returns the service URI. |
endpoint |
@property -> str |
Returns the service endpoint. |
api_key |
@property -> str |
Returns the API key (from env or default). |
Catalog Management
| Method | Signature | Description |
|---|---|---|
list_catalog_models() |
() -> list[FoundryModelInfo] |
Lists all available models in the catalog. |
refresh_catalog() |
() -> None |
Refreshes the model catalog. |
get_model_info() |
(alias_or_model_id: str, raise_on_not_found=False) -> FoundryModelInfo or None |
Gets model info by alias or ID. |
Cache Management
| Method | Signature | Description |
|---|---|---|
get_cache_location() |
() -> str |
Returns the model cache directory path. |
list_cached_models() |
() -> list[FoundryModelInfo] |
Lists models downloaded to the local cache. |
Model Management
| Method | Signature | Description |
|---|---|---|
download_model() |
(alias_or_model_id: str, token: str = None, force: bool = False) -> FoundryModelInfo] |
Downloads a model to the local cache. |
load_model() |
(alias_or_model_id: str, ttl: int = 600) -> FoundryModelInfo] |
Loads a model into the inference server. |
unload_model() |
(alias_or_model_id: str, force: bool = False) -> None |
Unloads a model from the inference server. |
list_loaded_models() |
() -> list[FoundryModelInfo] |
Lists all models currently loaded in the service. |
FoundryModelInfo
The methodslist_catalog_models(), list_cached_models(), and list_loaded_models() return a list of FoundryModelInfo objects. You can use the information contained in this object to further refine the list. Or get the information for a model directly by calling the get_model_info(alias_or_model_id) method.
These objects contain the following fields:
| Field | Type | Description |
|---|---|---|
alias |
str |
Alias of the model |
id |
str |
Unique identifier of the model |
version |
str |
Version of the model |
execution_provider |
str |
The accelerator (execution provider) used to run the model. |
device_type |
DeviceType |
Device type of the model: CPU, GPU, NPU |
uri |
str |
URI of the model |
file_size_mb |
int |
Size of the model on disk in MB |
supports_tool_calling |
bool |
Whether the model supports tool calling |
prompt_template |
dict \| None |
Prompt template for the model |
provider |
str |
Provider of the model ie where the model is published |
publisher |
str |
Publisher of the model ie who published the model |
license |
str |
The name of the license of the model |
task |
str |
Task of the model. One of chat-completions, automatic-speech-recognition |
ep_override |
str \| None |
Override for the execution provider, if different from the model's default |
Execution Providers
One of:
CPUExecutionProvider- CPU-based executionCUDAExecutionProvider- NVIDIA CUDA GPU executionWebGpuExecutionProvider- WebGPU executionQNNExecutionProvider- Qualcomm Neural Network execution (NPU)OpenVINOExecutionProvider- Intel OpenVINO executionNvTensorRTRTXExecutionProvider- NVIDIA TensorRT executionVitisAIExecutionProvider- AMD Vitis AI execution
Example Usage
The following code demonstrates how to use the FoundryManager class to manage models and interact with the Foundry Local service.
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be selected
# to your end-user's device.
alias = "qwen2.5-0.5b"
# Create a FoundryLocalManager instance. This will start the Foundry.
manager = FoundryLocalManager()
# List available models in the catalog
catalog = manager.list_catalog_models()
print(f"Available models in the catalog: {catalog}")
# Download and load a model
model_info = manager.download_model(alias)
model_info = manager.load_model(alias)
print(f"Model info: {model_info}")
# List models in cache
local_models = manager.list_cached_models()
print(f"Models in cache: {local_models}")
# List loaded models
loaded = manager.list_loaded_models()
print(f"Models running in the service: {loaded}")
# Unload a model
manager.unload_model(alias)
Integrate with OpenAI SDK
Install the OpenAI package:
pip install openai
The following code demonstrates how to integrate the FoundryLocalManager with the OpenAI SDK to interact with a local model.
import openai
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "qwen2.5-0.5b"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code us es the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
base_url=manager.endpoint,
api_key=manager.api_key # API key is not required for local usage
)
# Set the model to use and generate a streaming response
stream = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "Why is the sky blue?"}],
stream=True
)
# Print the streaming response
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
JavaScript SDK Reference
Installation
Install the package from npm:
npm install foundry-local-sdk
FoundryLocalManager Class
The FoundryLocalManager class lets you manage models, control the cache, and interact with the Foundry Local service in both browser and Node.js environments.
Initialization
import { FoundryLocalManager } from "foundry-local-sdk";
const foundryLocalManager = new FoundryLocalManager();
Available options:
serviceUrl: Base URL of the Foundry Local servicefetch: (optional) Custom fetch implementation for environments like Node.js
A note on aliases
Many methods outlined in this reference have an aliasOrModelId parameter in the signature. You can pass into the method either an alias or model ID as a value. Using an alias will:
- Select the best model for the available hardware. For example, if a Nvidia CUDA GPU is available, Foundry Local selects the CUDA model. If a supported NPU is available, Foundry Local selects the NPU model.
- Allow you to use a shorter name without needing to remember the model ID.
Tip
We recommend passing into the aliasOrModelId parameter an alias because when you deploy your application, Foundry Local acquires the best model for the end user's machine at run-time.
Note
If you have an Intel NPU on Windows, ensure you have installed the Intel NPU driver for optimal NPU acceleration.
Service Management
| Method | Signature | Description |
|---|---|---|
init() |
(aliasOrModelId?: string) => Promise<void> |
Initializes the SDK and optionally loads a model. |
isServiceRunning() |
() => Promise<boolean> |
Checks if the Foundry Local service is running. |
startService() |
() => Promise<void> |
Starts the Foundry Local service. |
serviceUrl |
string |
The base URL of the Foundry Local service. |
endpoint |
string |
The API endpoint (serviceUrl + /v1). |
apiKey |
string |
The API key (none). |
Catalog Management
| Method | Signature | Description |
|---|---|---|
listCatalogModels() |
() => Promise<FoundryModelInfo[]> |
Lists all available models in the catalog. |
refreshCatalog() |
() => Promise<void> |
Refreshes the model catalog. |
getModelInfo() |
(aliasOrModelId: string, throwOnNotFound = false) => Promise<FoundryModelInfo \| null> |
Gets model info by alias or ID. |
Cache Management
| Method | Signature | Description |
|---|---|---|
getCacheLocation() |
() => Promise<string> |
Returns the model cache directory path. |
listCachedModels() |
() => Promise<FoundryModelInfo[]> |
Lists models downloaded to the local cache. |
Model Management
| Method | Signature | Description |
|---|---|---|
downloadModel() |
(aliasOrModelId: string, token?: string, force = false, onProgress?) => Promise<FoundryModelInfo> |
Downloads a model to the local cache. |
loadModel() |
(aliasOrModelId: string, ttl = 600) => Promise<FoundryModelInfo> |
Loads a model into the inference server. |
unloadModel() |
(aliasOrModelId: string, force = false) => Promise<void> |
Unloads a model from the inference server. |
listLoadedModels() |
() => Promise<FoundryModelInfo[]> |
Lists all models currently loaded in the service. |
Example Usage
The following code demonstrates how to use the FoundryLocalManager class to manage models and interact with the Foundry Local service.
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";
const manager = new FoundryLocalManager();
// Initialize the SDK and optionally load a model
const modelInfo = await manager.init(alias);
console.log("Model Info:", modelInfo);
// Check if the service is running
const isRunning = await manager.isServiceRunning();
console.log(`Service running: ${isRunning}`);
// List available models in the catalog
const catalog = await manager.listCatalogModels();
// Download and load a model
await manager.downloadModel(alias);
await manager.loadModel(alias);
// List models in cache
const localModels = await manager.listCachedModels();
// List loaded models
const loaded = await manager.listLoadedModels();
// Unload a model
await manager.unloadModel(alias);
Integration with OpenAI Client
Install the OpenAI package:
npm install openai
The following code demonstrates how to integrate the FoundryLocalManager with the OpenAI client to interact with a local model.
import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "qwen2.5-0.5b";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager();
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias);
console.log("Model Info:", modelInfo);
const openai = new OpenAI({
baseURL: foundryLocalManager.endpoint,
apiKey: foundryLocalManager.apiKey,
});
async function streamCompletion() {
const stream = await openai.chat.completions.create({
model: modelInfo.id,
messages: [{ role: "user", content: "What is the golden ratio?" }],
stream: true,
});
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
}
streamCompletion();
Browser Usage
The SDK includes a browser-compatible version where you must specify the service URL manually:
import { FoundryLocalManager } from "foundry-local-sdk/browser";
// Specify the service URL
// Run the Foundry Local service using the CLI: `foundry service start`
// and use the URL from the CLI output
const endpoint = "ENDPOINT";
const manager = new FoundryLocalManager({ serviceUrl: endpoint });
// Note: The `init`, `isServiceRunning`, and `startService` methods
// are not available in the browser version
Note
The browser version doesn't support the init, isServiceRunning, and startService methods. You must ensure that the Foundry Local service is running before using the SDK in a browser environment. You can start the service using the Foundry Local CLI: foundry service start. You can glean the service URL from the CLI output.
Example Usage
import { FoundryLocalManager } from "foundry-local-sdk/browser";
// Specify the service URL
// Run the Foundry Local service using the CLI: `foundry service start`
// and use the URL from the CLI output
const endpoint = "ENDPOINT";
const manager = new FoundryLocalManager({ serviceUrl: endpoint });
const alias = "qwen2.5-0.5b";
// Get all available models
const catalog = await manager.listCatalogModels();
console.log("Available models in catalog:", catalog);
// Download and load a specific model
await manager.downloadModel(alias);
await manager.loadModel(alias);
// View models in your local cache
const localModels = await manager.listLocalModels();
console.log("Cached models:", catalog);
// Check which models are currently loaded
const loaded = await manager.listLoadedModels();
console.log("Loaded models in inference service:", loaded);
// Unload a model when finished
await manager.unloadModel(alias);
C# SDK Reference
Redesign
To improve your ability to ship applications using on-device AI, there are substantial changes to the architecture of the C# SDK in version 0.8.0 and later. In this section, we outline the key changes to help you migrate your applications to the latest version of the SDK.
Note
In the SDK version 0.8.0 and later, there are breaking changes in the API from previous versions.
Architecture changes
The following diagram shows how the previous architecture - for versions earlier than 0.8.0 - relied heavily on using a REST webserver to manage models and inference like chat completions:
The SDK would use a Remote Procedural Call (RPC) to find Foundry Local CLI executable on the machine, start the webserver, and then communicate with it over HTTP. This architecture had several limitations, including:
- Complexity in managing the webserver lifecycle.
- Challenging deployment: End users needed to have the Foundry Local CLI installed on their machines and your application.
- Version management of the CLI and SDK could lead to compatibility issues.
To address these issues, the redesigned architecture in version 0.8.0 and later uses a more streamlined approach. The new architecture is as follows:
In this new architecture:
- Your application is self-contained. It doesn't require the Foundry Local CLI to be installed separately on the end user's machine making it easier for you to deploy applications.
- The REST web server is optional. You can still use the web server if you want to integrate with other tools that communicate over HTTP. Read Use chat completions via REST server with Foundry Local for details on how to use this feature.
- The SDK has native support for chat completions and audio transcriptions, allowing you to build conversational AI applications with fewer dependencies. Read Use Foundry Local native chat completions API for details on how to use this feature.
- On Windows devices, you can use a Windows ML build that handles hardware acceleration for models on the device by pulling in the right runtime and drivers.
API changes
Version 0.8.0 and later provides a more object-orientated and composable API. The main entry point continues to be the FoundryLocalManager class, but instead of being a flat set of methods that operate via static calls to a stateless HTTP API, the SDK now exposes methods on the FoundryLocalManager instance that maintain state about the service and models.
| Primitive | Versions < 0.8.0 | Versions >= 0.8.0 |
|---|---|---|
| Configuration | N/A | config = Configuration(...) |
| Get Manager | mgr = FoundryLocalManager(); |
await FoundryLocalManager.CreateAsync(config, logger);var mgr = FoundryLocalManager.Instance; |
| Get Catalog | N/A | catalog = await mgr.GetCatalogAsync(); |
| List Models | mgr.ListCatalogModelsAsync(); |
catalog.ListModelsAsync(); |
| Get Model | mgr.GetModelInfoAsync("aliasOrModelId"); |
catalog.GetModelAsync(alias: "alias"); |
| Get Variant | N/A | model.SelectedVariant; |
| Set Variant | N/A | model.SelectVariant(); |
| Download a model | mgr.DownloadModelAsync("aliasOrModelId"); |
model.DownloadAsync() |
| Load a model | mgr.LoadModelAsync("aliasOrModelId"); |
model.LoadAsync() |
| Unload a Model | mgr.UnloadModelAsync("aliasOrModelId"); |
model.UnloadAsync() |
| List Loaded Models | mgr.ListLoadedModelsAsync(); |
catalog.GetLoadedModelsAsync(); |
| Get Model Path | N/A | model.GetPathAsync() |
| Start service | mgr.StartServiceAsync(); |
mgr.StartWebServerAsync(); |
| Stop Service | mgr.StopServiceAsync(); |
mgr.StopWebServerAsync(); |
| Cache Location | mgr.GetCacheLocationAsync(); |
config.ModelCacheDir |
| List Cached Models | mgr.ListCachedModelsAsync(); |
catalog.GetCachedModelsAsync(); |
The API allows Foundry Local to be more configurable over the web server, logging, cache location, and model variant selection. For example, the Configuration class allows you to set up the application name, logging level, web server URLs, and directories for application data, model cache, and logs:
var config = new Configuration
{
AppName = "app-name",
LogLevel = Microsoft.AI.Foundry.Local.LogLevel.Information,
Web = new Configuration.WebService
{
Urls = "http://127.0.0.1:55588"
},
AppDataDir = "./foundry_local_data",
ModelCacheDir = "{AppDataDir}/model_cache",
LogsDir = "{AppDataDir}/logs"
};
In the previous version of the Foundry Local C# SDK, you couldn't configure these settings directly through the SDK, which limited your ability to customize the behavior of the service.
Project setup guide
There are two NuGet packages for the Foundry Local SDK - a WinML and a cross-platform package - that have the same API surface but are optimized for different platforms:
- Windows: Uses the
Microsoft.AI.Foundry.Local.WinMLpackage that's specific to Windows applications, which uses the Windows Machine Learning (WinML) framework to deliver optimal performance and user experience on Windows devices. - Cross-Platform: Use the
Microsoft.AI.Foundry.Localpackage that can be used for cross-platform applications (Windows, Linux, macOS).
Depending on your target platform, follow these instructions to create a new C# application and add the necessary dependencies:
Use Foundry Local in your C# project by following these Windows-specific or Cross-Platform (macOS/Linux/Windows) instructions:
- Create a new C# project and navigate into it:
dotnet new console -n app-name cd app-name - Open and edit the
app-name.csprojfile to:<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net9.0-windows10.0.26100</TargetFramework> <RootNamespace>app-name</RootNamespace> <ImplicitUsings>enable</ImplicitUsings> <Nullable>enable</Nullable> <WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained> <WindowsPackageType>None</WindowsPackageType> <EnableCoreMrtTooling>false</EnableCoreMrtTooling> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.AI.Foundry.Local.WinML" Version="0.8.2.1" /> <PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.10" /> <PackageReference Include="OpenAI" Version="2.5.0" /> </ItemGroup> </Project> - Create a
nuget.configfile in the project root with the following content so that the packages restore correctly:<?xml version="1.0" encoding="utf-8"?> <configuration> <packageSources> <clear /> <add key="nuget.org" value="https://api.nuget.org/v3/index.json" /> <add key="ORT" value="https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT/nuget/v3/index.json" /> </packageSources> <packageSourceMapping> <packageSource key="nuget.org"> <package pattern="*" /> </packageSource> <packageSource key="ORT"> <package pattern="*Foundry*" /> </packageSource> </packageSourceMapping> </configuration>
Reduce application package size
The Foundry Local SDK pulls in Microsoft.ML.OnnxRuntime.Foundry NuGet package as a dependency. The Microsoft.ML.OnnxRuntime.Foundry package provides the inference runtime bundle, which is the set of libraries required to efficiently run inference on specific vendor hardware devices. The inference runtime bundle includes the following components:
- ONNX Runtime library: The core inference engine (
onnxruntime.dll). - ONNX Runtime Execution Provider (EP) library. A hardware-specific backend in ONNX Runtime that optimizes and executes parts of a machine learning model a hardware accelerator. For example:
- CUDA EP:
onnxruntime_providers_cuda.dll - QNN EP:
onnxruntime_providers_qnn.dll
- CUDA EP:
- Independent Hardware Vendor (IHV) libraries. For example:
- WebGPU: DirectX dependencies (
dxcompiler.dll,dxil.dll) - QNN: Qualcomm QNN dependencies (
QnnSystem.dll, etc.)
- WebGPU: DirectX dependencies (
The following table summarizes what EP and IHV libraries are bundled with your application and what WinML will download/install at runtime:

In all platform/architecture, the CPU EPU is required. The WebGPU EP and IHV libraries are small in size (for example, WebGPU only adds ~7MB to your application package) and are required in Windows and macOS. However, the CUDA and QNN EPs are large in size (for example, CUDA adds ~1GB to your application package) so we recommend excluding these EPs from your application package. WinML will download/install CUDA and QNN at runtime if the end user has compatible hardware.
Note
We're working on removing the CUDA and QNN EPs from the Microsoft.ML.OnnxRuntime.Foundry package in future releases so that you don't need to include an ExcludeExtraLibs.props file to remove them from your application package.
To reduce the size of your application package, you can create an ExcludeExtraLibs.props file in your project directory with the following content, which excludes the CUDA and QNN EP and IHV libraries when you publish your application:
<Project>
<!-- we want to ensure we're using the onnxruntime libraries from Foundry Local Core so
we delete the WindowsAppSdk versions once they're unzipped. -->
<Target Name="ExcludeOnnxRuntimeLibs" AfterTargets="ExtractMicrosoftWindowsAppSDKMsixFiles">
<Delete Files="$(MicrosoftWindowsAppSDKMsixContent)\onnxruntime.dll"/>
<Delete Files="$(MicrosoftWindowsAppSDKMsixContent)\onnxruntime_providers_shared.dll"/>
<Message Importance="Normal" Text="Deleted onnxruntime libraries from $(MicrosoftWindowsAppSDKMsixContent)." />
</Target>
<!-- Remove CUDA EP and IHV libraries on Windows x64 -->
<Target Name="ExcludeCudaLibs" Condition="'$(RuntimeIdentifier)'=='win-x64'" AfterTargets="ResolvePackageAssets">
<ItemGroup>
<!-- match onnxruntime*cuda.* (we're matching %(Filename) which excludes the extension) -->
<NativeCopyLocalItems Remove="@(NativeCopyLocalItems)"
Condition="$([System.Text.RegularExpressions.Regex]::IsMatch('%(Filename)',
'^onnxruntime.*cuda.*', RegexOptions.IgnoreCase))" />
</ItemGroup>
<Message Importance="Normal" Text="Excluded onnxruntime CUDA libraries from package." />
</Target>
<!-- Remove QNN EP and IHV libraries on Windows arm64 -->
<Target Name="ExcludeQnnLibs" Condition="'$(RuntimeIdentifier)'=='win-arm64'" AfterTargets="ResolvePackageAssets">
<ItemGroup>
<NativeCopyLocalItems Remove="@(NativeCopyLocalItems)"
Condition="$([System.Text.RegularExpressions.Regex]::IsMatch('%(Filename)%(Extension)',
'^QNN.*\.dll', RegexOptions.IgnoreCase))" />
<NativeCopyLocalItems Remove="@(NativeCopyLocalItems)"
Condition="$([System.Text.RegularExpressions.Regex]::IsMatch('%(Filename)',
'^libQNNhtp.*', RegexOptions.IgnoreCase))" />
<NativeCopyLocalItems Remove="@(NativeCopyLocalItems)"
Condition="'%(FileName)%(Extension)' == 'onnxruntime_providers_qnn.dll'" />
</ItemGroup>
<Message Importance="Normal" Text="Excluded onnxruntime QNN libraries from package." />
</Target>
<!-- need to manually copy on linux-x64 due to the nuget packages not having the correct props file setup -->
<ItemGroup Condition="'$(RuntimeIdentifier)' == 'linux-x64'">
<!-- 'Update' as the Core package will add these dependencies, but we want to be explicit about the version -->
<PackageReference Update="Microsoft.ML.OnnxRuntime.Gpu" />
<PackageReference Update="Microsoft.ML.OnnxRuntimeGenAI.Cuda" />
<OrtNativeLibs Include="$(NuGetPackageRoot)microsoft.ml.onnxruntime.gpu.linux/$(OnnxRuntimeVersion)/runtimes/$(RuntimeIdentifier)/native/*" />
<OrtGenAINativeLibs Include="$(NuGetPackageRoot)microsoft.ml.onnxruntimegenai.cuda/$(OnnxRuntimeGenAIVersion)/runtimes/$(RuntimeIdentifier)/native/*" />
</ItemGroup>
<Target Name="CopyOrtNativeLibs" AfterTargets="Build" Condition=" '$(RuntimeIdentifier)' == 'linux-x64'">
<Copy SourceFiles="@(OrtNativeLibs)" DestinationFolder="$(OutputPath)"></Copy>
<Copy SourceFiles="@(OrtGenAINativeLibs)" DestinationFolder="$(OutputPath)"></Copy>
</Target>
</Project>
In your project file (.csproj), add the following line to import the ExcludeExtraLibs.props file:
<!-- other project file content -->
<Import Project="ExcludeExtraLibs.props" />
Linux: CUDA dependencies
The CUDA EP is pulled into your Linux application via Microsoft.ML.OnnxRuntime.Foundry, but we don't include the IHV libraries. If you want to allow your end users with CUDA-enabled devices to benefit from higher performance, you need add the following CUDA IHV libraries to your application:
- CUBLAS v12.8.4 (download from NVIDIA Developer)
- cublas64_12.dll
- cublasLt64_12.dll
- CUDA RT v12.8.90 (download from NVIDIA Developer)
- cudart64_12.dll
- CUDNN v9.8.0 (download from NVIDIA Developer)
- cudnn_graph64_9.dll
- cudnn_ops64_9.dll
- cudnn64_9.dll
- CUDA FFT v11.3.3.83 (download from NVIDIA Developer)
- cufft64_11.dll
Warning
Adding the CUDA EP and IHV libraries to your application increase the size of your application package by 1GB.
Samples
- For sample applications that demonstrate how to use the Foundry Local C# SDK, see the Foundry Local C# SDK Samples GitHub repository.
API reference
- For more details on the Foundry Local C# SDK read Foundry Local C# SDK API Reference.
Rust SDK reference
The Rust SDK for Foundry Local provides a way to manage models, control the cache, and interact with the Foundry Local service.
Installation
To use the Foundry Local Rust SDK, add the following to your Cargo.toml:
[dependencies]
foundry-local-sdk = "0.1"
Alternatively, you can add the Foundry Local crate using cargo:
cargo add foundry-local
FoundryLocalManager
Manager for Foundry Local SDK operations.
Fields
service_uri: Option<String>— URI of the Foundry service.client: Option<HttpClient>— HTTP client for API requests.catalog_list: Option<Vec<FoundryModelInfo>>— Cached list of catalog models.catalog_dict: Option<HashMap<String, FoundryModelInfo>>— Cached dictionary of catalog models.timeout: Option<u64>— Optional HTTP client timeout.
Methods
pub fn builder() -> FoundryLocalManagerBuilder
Create a new builder forFoundryLocalManager.pub fn service_uri(&self) -> Result<&str>
Get the service URI.
Returns: URI of the Foundry service.fn client(&self) -> Result<&HttpClient>
Get the HTTP client instance.
Returns: HTTP client.pub fn endpoint(&self) -> Result<String>
Get the endpoint for the service.
Returns: Endpoint URL.pub fn api_key(&self) -> String
Get the API key for authentication.
Returns: API key.pub fn is_service_running(&mut self) -> bool
Check if the service is running and set the service URI if found.
Returns:trueif running,falseotherwise.pub fn start_service(&mut self) -> Result<()>
Start the Foundry Local service.pub async fn list_catalog_models(&mut self) -> Result<&Vec<FoundryModelInfo>>
Get a list of available models in the catalog.pub fn refresh_catalog(&mut self)
Refresh the catalog cache.pub async fn get_model_info(&mut self, alias_or_model_id: &str, raise_on_not_found: bool) -> Result<FoundryModelInfo>
Get model information by alias or ID.
Arguments:alias_or_model_id: Alias or Model ID.raise_on_not_found: If true, error if not found.
pub async fn get_cache_location(&self) -> Result<String>
Get the cache location as a string.pub async fn list_cached_models(&mut self) -> Result<Vec<FoundryModelInfo>>
List cached models.pub async fn download_model(&mut self, alias_or_model_id: &str, token: Option<&str>, force: bool) -> Result<FoundryModelInfo>
Download a model.
Arguments:alias_or_model_id: Alias or Model ID.token: Optional authentication token.force: Force re-download if already cached.
pub async fn load_model(&mut self, alias_or_model_id: &str, ttl: Option<i32>) -> Result<FoundryModelInfo>
Load a model for inference.
Arguments:alias_or_model_id: Alias or Model ID.ttl: Optional time-to-live in seconds.
pub async fn unload_model(&mut self, alias_or_model_id: &str, force: bool) -> Result<()>
Unload a model.
Arguments:alias_or_model_id: Alias or Model ID.force: Force unload even if in use.
pub async fn list_loaded_models(&mut self) -> Result<Vec<FoundryModelInfo>>
List loaded models.
FoundryLocalManagerBuilder
Builder for creating a FoundryLocalManager instance.
Fields
alias_or_model_id: Option<String>— Alias or model ID to download and load.bootstrap: bool— Whether to start the service if not running.timeout_secs: Option<u64>— HTTP client timeout in seconds.
Methods
pub fn new() -> Self
Create a new builder instance.pub fn alias_or_model_id(mut self, alias_or_model_id: impl Into<String>) -> Self
Set the alias or model ID to download and load.pub fn bootstrap(mut self, bootstrap: bool) -> Self
Set whether to start the service if not running.pub fn timeout_secs(mut self, timeout_secs: u64) -> Self
Set the HTTP client timeout in seconds.pub async fn build(self) -> Result<FoundryLocalManager>
Build theFoundryLocalManagerinstance.
FoundryModelInfo
Represents information about a model.
Fields
alias: String— The model alias.id: String— The model ID.version: String— The model version.runtime: ExecutionProvider— The execution provider (CPU, CUDA, etc.).uri: String— The model URI.file_size_mb: i32— Model file size in MB.prompt_template: serde_json::Value— Prompt template for the model.provider: String— Provider name.publisher: String— Publisher name.license: String— License type.task: String— Model task (e.g., text-generation).
Methods
from_list_response(response: &FoundryListResponseModel) -> Self
Creates aFoundryModelInfofrom a catalog response.to_download_body(&self) -> serde_json::Value
Converts the model info to a JSON body for download requests.
ExecutionProvider
Enum for supported execution providers.
CPUWebGPUCUDAQNN
Methods
get_alias(&self) -> String
Returns a string alias for the execution provider.
ModelRuntime
Describes the runtime environment for a model.
device_type: DeviceTypeexecution_provider: ExecutionProvider