Edit

Share via


Customize AI functions with PySpark

AI functions are designed to work out of the box, with the underlying model and settings configured by default. Users who want more flexible configurations, however, can customize their solutions with a few extra lines of code.

Important

Note

Configurations

If you're working with AI functions in PySpark, you can use the OpenAIDefaults class to configure the underlying AI model used by all functions. Settings that can only be applied per function call are specified in the following section.

Parameter Description Default
concurrency An int that designates the maximum number of rows to process in parallel with asynchronous requests to the model. Higher values speed up processing time (if your capacity can accommodate it). It can be set up to 1,000. This value must be set per individual AI function call. In spark, this concurrency value is for each worker. 50
deployment_name A string value that designates the name of the underlying model. You can choose from models supported by Fabric. This value can also be set to a custom model deployment in Azure OpenAI or Azure AI Foundry. In the Azure portal, this value appears under Resource Management > Model Deployments. In the Azure AI Foundry portal, the value appears on the Deployments page. gpt-4.1-mini
embedding_deployment_name A string value that designates the name of the embedding model deployment that powers AI functions. text-embedding-ada-002
reasoning_effort Part of OpenAIDefaults. Used by gpt-5 series models for number of reasoning tokens they should use. Can be set to None or a string value of "minimal", "low", "medium", or "high". None
subscription_key An API key used for authentication with your large language model (LLM) resource. In the Azure portal, this value appears in the Keys and Endpoint section. N/A
temperature A numeric value between 0.0 and 1.0. Higher temperatures increase the randomness or creativity of the underlying model's outputs. 0.0
top_p Part of OpenAIDefaults. A float between 0 and 1. A lower value (for example, 0.1) restricts the model to consider only the most probable tokens, making the output more deterministic. A higher value (for example, 0.9) allows for more diverse and creative outputs by including a broader range of tokens. None
URL A URL that designates the endpoint of your LLM resource. In the Azure portal, this value appears in the Keys and Endpoint section. For example: https://your-openai-endpoint.openai.azure.com/. N/A
verbosity Part of OpenAIDefaults. Used by gpt-5 series models for output length. Can be set to None or a string value of "low", "medium", or "high". None

The following code sample shows how to configure concurrency for an individual function call.

df = spark.createDataFrame([
        ("There are an error here.",),
        ("She and me go weigh back. We used to hang out every weeks.",),
        ("The big picture are right, but you're details is all wrong.",),
    ], ["text"])

results = df.ai.fix_grammar(input_col="text", output_col="corrections", concurrency=200)
display(results)

The following code sample shows how to configure the gpt-5 and other reasoning models for all functions.

from synapse.ml.services.openai import OpenAIDefaults
default_conf = OpenAIDefaults()

default_conf.set_deployment_name("gpt-5")
default_conf.set_temperature(1)  # gpt-5 only accepts default value of temperature
default_conf.set_top_p(1)  # gpt-5 only accepts default value of top_p
default_conf.set_verbosity("low")
default_conf.set_reasoning_effort("low")

You can retrieve and print each of the OpenAIDefaults parameters with the following code sample:

print(default_conf.get_deployment_name())
print(default_conf.get_subscription_key())
print(default_conf.get_URL())
print(default_conf.get_temperature())

You can also reset the parameters as easily as you modified them. The following code sample resets the AI functions library so that it uses the default Fabric LLM endpoint:

default_conf.reset_deployment_name()
default_conf.reset_subscription_key()
default_conf.reset_URL()
default_conf.reset_temperature()

Custom models

Choose another supported large language model

Set the deployment_name to one of the models supported by Fabric.

  • Globally in the OpenAIDefaults() object:

    from synapse.ml.services.openai import OpenAIDefaults
    default_conf = OpenAIDefaults()
    default_conf.set_deployment_name("<model deployment name>")
    
  • Individually in each AI function call:

    results = df.ai.translate(
        to_lang="spanish",
        input_col="text",
        output_col="out",
        error_col="error_col",
        deploymentName="<model deployment name>",
    )
    

Choose another supported embedding model

Set the embedding_deployment_name to one of the models supported by Fabric when using ai.embed or ai.similarity functions.

  • Globally in the OpenAIDefaults() object:

    from synapse.ml.services.openai import OpenAIDefaults
    default_conf = OpenAIDefaults()
    default_conf.set_embedding_deployment_name("<embedding deployment name>")
    
  • Individually in each AI function call:

    results = df.ai.embed(
        input_col="english",
        output_col="out",
        deploymentName="<embedding deployment name>",
    )
    

Configure a custom model endpoint

By default, AI functions use the Fabric LLM endpoint API for unified billing and easy setup. You may choose to use your own model endpoint by setting up an Azure OpenAI or AsyncOpenAI-compatible client with your endpoint and key. The following code sample uses placeholder values to show you how to override the built-in Fabric AI endpoint with your own Microsoft AI Foundry (formerly Azure OpenAI) resource's model deployments:

from synapse.ml.services.openai import OpenAIDefaults
default_conf = OpenAIDefaults()

default_conf.set_URL("https://<ai-foundry-resource>.openai.azure.com/")
default_conf.set_subscription_key("<API_KEY>")

The following code sample uses placeholder values to show you how to override the built-in Fabric AI endpoint with a custom Microsoft AI Foundry resource to use models beyond OpenAI:

Important

  • Support for Microsoft AI Foundry models is limited to models that support Chat Completions API and accept response_format parameter with JSON schema
  • Output may vary depending on the behavior of the selected AI model. Please explore the capabilities of other models with appropriate caution
  • The embedding based AI functions ai.embed and ai.similarity aren't supported when using an AI Foundry resource
import synapse.ml.spark.aifunc.DataFrameExtensions
from synapse.ml.services.openai import OpenAIDefaults

default_conf = OpenAIDefaults()
default_conf.set_URL("https://<ai-foundry-resource>.services.ai.azure.com")  # Use your AI Foundry Endpoint
default_conf.set_subscription_key("<API_KEY>")
default_conf.set_deployment_name("grok-4-fast-non-reasoning")