Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Extracts structured data from a document column using AI/LLM.
For the corresponding Databricks SQL function, see ai_extract function.
Syntax
from pyspark.sql import functions as dbf
dbf.ai_extract(col=<col>, schema=<schema>, options=<options>)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
A column containing the document content to extract from. |
schema |
dict or list |
A Python dict (field name to {"type": ..., "description": ...}) or list of field-name strings. Serialized to a JSON literal automatically. |
options |
dict, optional |
A dictionary of options to control extraction behavior. |
Returns
pyspark.sql.Column: A new column of VariantType containing the extracted fields.
Examples
df.select(ai_extract("text", {"name": {"type": "string", "description": "Name"}}))
df.select(ai_extract("text", ["name", "age"]))