What is the azure_local_ai extension for Azure Database for PostgreSQL - Flexible Server (Preview)

The azure_local_ai extension for Azure Database for PostgreSQL flexible server allows you to use registered, pretrained, open-source models deployed locally to your Azure Database for PostgreSQL server. These models can be used to create text embeddings that can provide context to your Retrieval Augmented Generation (RAG) pattern as you build rich generative AI applications.  The azure_local_ai extension enables the database to call locally deployed models to create vector embeddings from text data, simplifying the development process and reducing latency by removing the need to make more remote API calls to AI embedding models hosted outside of the PostgreSQL boundary. In this release, the extension deploys a single model, multilingual-e5-small, to your Azure Database for PostgreSQL Flexible Server instance. Other third-party open-source models might become available for installation on an ongoing basis.

Local embeddings help customers:

  • Reduce latency of embedding creation.

  • Use embedding models at a predictable cost.

  • Keep data within their database eliminating the need to transmit data to a remote endpoint.

During this public preview, the azure_local_ai extension will be available in these Azure regions:

 

  • Australia East

  • East USA

  • France Central

  • Japan East

  • UK South

  • West Europe

  • West USA

This preview feature is also only available for newly deployed Azure Database for PostgreSQL Flexible Server instances.

Important

The azure_local_ai extension is currently in preview. Microsoft's Open-source AI models for installation through the Azure Local AI extension are deemed Non-Microsoft Products under the Microsoft Product Terms. The customer's use of open-source AI models is governed by the separate license terms provided in product documentation associated with such models made available through the azure_local_ai extension. Supplemental Terms of Use: Limited Access AI Services (Previews)

Enable the azure_local_ai extension (preview)

Before you can enable azure_local_ai on your Azure Database for PostgreSQL flexible server instance, you need to add it to your allowlist as described in how to use PostgreSQL extensions and check that it was correctly added by running the following SQL statement, SHOW azure.extensions;.

Important

Hosting language models in the database requires a large memory footprint. To support this requirement, azure_local_ai is only supported on memory-optimized Azure VM SKUs with a minimum of 4 vCores. Today, if you are using a VM that does not meet the minimum requirements, the azure_local_ai extension will not appear in the list of available extensions in Server parameters. Select Server parameters from the Settings section of the Resource Menu in the Azure Database for PostgreSQL Flexible Server Azure portal page.

Screenshot of PostgreSQL server parameters page.

Search for "extensions" or "azure.extensions"

Screenshot of Extensions available to allowlist for Azure Database for Postgresql - Flexible server.

Select AZURE_LOCAL_AI from the extensions list.

Screenshot of Extensions allowlist screenshot for Azure Local AI extension.

Select Save to apply the changes and begin the allowlist deployment process.

Screenshot of Extensions saved to allowlist for Azure Local AI extension.

You can monitor this deployment via the bell icon at the top of the Azure portal.

Screenshot of Extensions allowlist deployment status for Azure Local AI extension.

Once the allowlist deployment is completed, you can continue with the installation process.

Note

Enabling Azure Local AI preview will deploy the multilingual-e5-small model to your Azure Database for PostgreSQL Flexible Server instance. The linked documentation provides licensing terms from the e5 team. Additional third-party open-source models might become available for installation on an ongoing basis.

Now you can install the extension by connecting to your target database and running the CREATE EXTENSION command. You need to repeat the command separately for every database in which you want the extension to be available.

List extensions are allowed on the database from the Azure portal - Server Parameters page.

SHOW azure.extensions;

Create the extension within the database.

CREATE EXTENSION azure_local_ai;

Installing the extension azure_local_ai creates the following schema:

  • azure_local_ai: principal schema in which the extension creates tables, functions, and any other SQL-related object it requires to implement and expose its functionality.

Important

You want to enable the vector extension, as it is required to store text embeddings in your PostgreSQL database.

Functions provided by the azure_local_ai extension

The azure_local_ai extension provides a set of functions. These functions allow you to create vector embeddings from text data, making it easier to develop generative AI applications. The extension offers functions for creating embeddings, getting settings, and more. By using these functions, you can simplify the development process and reduce latency by eliminating the need for additional remote API calls to AI embedding models hosted outside of the PostgreSQL boundary.

Schema Name Result data type Argument data types
azure_local_ai create_embeddings TABLE(embedding real[]) model_uri text, inputs text[], batch_size bigint DEFAULT 128, timeout_ms integer DEFAULT 3600000
azure_local_ai create_embeddings real[] model_uri text, input text, timeout_ms integer DEFAULT 3600000
azure_local_ai get_setting jsonb keys text[] DEFAULT ARRAY[]::text[], timeout_ms integer DEFAULT 3600000
azure_local_ai get_setting text key text, timeout_ms integer DEFAULT 3600000
azure_local_ai model_metadata jsonb model_uri text

These can be displayed via the PSQL command,

\df azure_local_ai.*

azure_local_ai.create_embeddings

The azure_local_ai extension allows you to create and update embeddings both in scalar and batch format, invoking the locally deployed LLM.

azure_local_ai.create_embeddings(model_uri text, input text, batch_size bigint DEFAULT 128, timeout_ms integer DEFAULT 3600000);
azure_local_ai.create_embeddings(model_uri text, array[inputs [text]], batch_size bigint DEFAULT 128, timeout_ms integer DEFAULT 3600000);

Arguments

model_uri

text name of the text embedding model invoked to create the embedding.

input

text or text[] single text or array of texts, depending on the overload of the function used, for which embeddings are created.

batch_size

bigint DEFAULT 128 number of records to process at a time (only available for the overload of the function for which parameter input is of type text[]).

timeout_ms

integer DEFAULT 3600000 timeout in milliseconds after which the operation is stopped.

Simple create embeddings examples:

SELECT azure_local_ai.create_embeddings('model_uri TEXT', 'query: input TEXT');
SELECT azure_local_ai.create_embeddings('multilingual-e5-small:v1', 'query: Vector databases are awesome');
SELECT azure_local_ai.create_embeddings('model_uri TEXT', array['input TEXT', 'input TEXT']);
SELECT azure_local_ai.create_embeddings('multilingual-e5-small:v1', array['Hello', 'World']);

Check the azure_local_ai extension version

SELECT * FROM pg_available_extensions
WHERE NAME ='azure_local_ai';

ONNX Runtime Configuration

azure_local_ai.get_setting

Used to obtain current values of configuration options.

SELECT azure_local_ai.get_setting(key TEXT)

azure_local_ai supports reviewing the configuration parameters of ONNX Runtime thread-pool within the ONNX Runtime Service. Changes are not allowed at this time. See ONNX Runtime performance tuning.

Arguments

Key

Valid values for the key are:

  • intra_op_parallelism: Sets total number of threads used for parallelizing single operator by ONNX Runtime thread-pool. By default, we maximize the number of intra ops threads as much as possible as it improves the overall throughput much (all available cpus by default).
  • inter_op_parallelism: Sets total number of threads used for computing multiple operators in parallel by ONNX Runtime thread-pool. By default, we set it to minimum possible thread, which is 1. Increasing it often hurts performance due to frequent context switches between threads.
  • spin_control: Switches ONNX Runtime thread-pool's spinning for requests. When disabled, it uses less cpu and hence causes more latency. By default, it is set to true (enabled).

Return type

TEXT representing the current value of the selected setting.