Use ai.embed with PySpark

The ai.embed function converts text into vector embeddings that represent meaning. Use embeddings to search, group, and compare content by meaning instead of exact wording.

Note

This article covers ai.embed with PySpark. For pandas, see Use ai.embed with pandas.
For all AI Functions and prerequisites, see AI Functions overview.
Change default configuration for AI Functions with PySpark.

Overview

The ai.embed function is available for Spark DataFrames. You must specify the name of an existing input column as a parameter.

The function returns a new DataFrame that includes embeddings for each row of input text, in an output column.

Syntax

df.ai.embed(input_col="col1", output_col="embed")

Parameters

Name	Description
`input_col` Required	A string that contains the name of an existing column with input text values to use for computing embeddings.
`output_col` Optional	A string that contains the name of a new column to store calculated embeddings for each input text row. If you don't set this parameter, a default name generates for the output column.
`error_col` Optional	A string that contains the name of a new column that stores any OpenAI errors that result from processing each input text row. If you don't set this parameter, a default name generates for the error column. If an input row has no errors, this column has a `null` value.

Returns

The function returns a Spark DataFrame with a new column that contains generated embeddings for each input row. Embeddings are pyspark.ml.linalg.DenseVector values. Vector size depends on the embedding model dimensions, which are configurable in AI Functions.

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",), 
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",), 
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",) 
    ], ["descriptions"])

embed = df.ai.embed(input_col="descriptions", output_col="embed")
display(embed)

Output:

Use ai.embed with pandas.
Learn more about AI Functions.
Change default configuration for AI Functions with PySpark.
Understand billing for AI Functions.

Feedback

Was this page helpful?

Last updated on 2026-06-15