Extract entities with the
The ai.extract
function uses Generative AI to scan input text and extract specific types of information designated by labels you choose—for example, locations or names—all with a single line of code.
AI functions turbocharge data engineering by putting the power of Fabric's built-in large languages models into your hands. To learn more, visit this overview article.
Important
This feature is in preview, for use in the Fabric 1.3 runtime and higher.
- Review the prerequisites in this overview article, including the library installations that are temporarily required to use AI functions.
- By default, AI functions are currently powered by the gpt-3.5-turbo (0125) model. To learn more about billing and consumption rates, visit this article.
- Although the underlying model can handle several languages, most of the AI functions are optimized for use on English-language texts.
- During the initial rollout of AI functions, users are temporarily limited to 1,000 requests per minute with Fabric's built-in AI endpoint.
Use ai.extract
with pandas
The ai.extract
function extends the pandas Series class. Call the function on a pandas DataFrame text column to extract custom entity types from each row of input.
Unlike other AI functions, ai.extract
returns a pandas DataFrame, instead of a Series, with a separate column for each specified entity type that contains extracted values for each input row.
Syntax
df_entities = df["text"].ai.extract("entity1", "entity2", "entity3")
Parameters
Name | Description |
---|---|
labels Required |
One or more strings representing the set of entity types to be extracted from the input text values. |
Returns
The function returns a pandas DataFrame with a column for each specified entity type. The column or columns contain the entities extracted for each row of input text. If the function identifies more than one match for a given entity, it returns only one of those matches. If no match is found, the result is null
.
Example
# This code uses AI. Always review output for mistakes.
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/
df = pd.DataFrame([
"MJ Lee lives in Tuscon, AZ, and works as a software engineer for Microsoft.",
"Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
], columns=["descriptions"])
df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)
Use ai.extract
with PySpark
The ai.extract
function is also available for Spark DataFrames. The name of an existing input column must be specified as a parameter, along with a list of entity types to extract from each row of text.
The function returns a new DataFrame, with a separate column for each specified entity type that contains extracted values for each input row.
Syntax
df.ai.extract(labels=["entity1", "entity2", "entity3"], input_col="text")
Parameters
Name | Description |
---|---|
labels Required |
An array of strings that represents the set of entity types to be extracted from the text values in the input column. |
input_col Required |
A string that contains the name of an existing column with input text values to be scanned for the custom entities. |
error_col Optional |
A string that contains the name of a new column to store any OpenAI errors that result from processing each input text row. If this parameter isn't set, a default name is generated for the error column. If an input row has no errors, the value in this column is null . |
Returns
The function returns a Spark DataFrame with a new column for each specified entity type. The column or columns contain the entities extracted for each row of input text. If the function identifies more than one match for a given entity, it returns only one of those matches. If no match is found, the result is null
.
Example
# This code uses AI. Always review output for mistakes.
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/
df = spark.createDataFrame([
("MJ Lee lives in Tuscon, AZ, and works as a software engineer for Microsoft.",),
("Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey.",)
], ["descriptions"])
df_entities = df.ai.extract(labels=["name", "profession", "city"], input_col="descriptions")
display(df_entities)
Related content
- Calculate similarity with
ai.similarity
. - Categorize text with
ai.classify
. - Detect sentiment with
ai.analyze_sentiment
. - Fix grammar with
ai.fix_grammar
. - Summarize text with
ai.summarize
. - Translate text with
ai.translate
. - Answer custom user prompts with
ai.generate_response
. - Learn more about the full set of AI functions here.
- Learn how to customize the configuration of AI functions here.
- Did we miss a feature you need? Suggest it on the Fabric Ideas forum.