Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This page introduces AI/BI Genie, an Azure Databricks feature that allows business teams to interact with their data using natural language. It uses generative AI tailored to your organization's terminology and data, with the ability to monitor and refine its performance through user feedback.
Overview
Domain experts, such as data analysts, configure Genie spaces with datasets, sample queries, and text guidelines to help Genie translate business questions into analytical queries. After set up, business users can ask questions and generate visualizations to understand operational data. You can continuously update Genie's semantic knowledge as your data changes and users pose new questions. For additional information about Databricks AI-powered features, see Databricks AI-powered features.
AI/BI Genie selects relevant names and descriptions from annotated tables and columns to convert natural language questions to an equivalent SQL query. Then, it responds with the generated query and results table, if possible. If Genie can't generate an answer, it can ask follow-up questions to clarify before providing a response.
Example use cases
You can create different Genie spaces to serve various non-technical audiences. The following scenarios describe two possible use cases.
Example 1: Visualize opportunity status
A sales manager wants to get the current status of open and closed opportunities by stage in their sales pipeline. They can interact with the Genie space using natural language and automatically generate a visualization.
The following gif shows this interaction:
Example 2: Tracking logistics
A logistics company wants to use Genie spaces to help business users from different departments track operational and financial details. They set up a Genie space for their shipment facility managers to track shipments and another for their financial executives to understand their financial health.
What data should I use?
A Genie space is based on data registered to Unity Catalog, including managed tables, external tables, foreign tables, views, metric views, and materialized views. AI/BI Genie uses the metadata attached to Unity Catalog objects, as well as an author-curated space-level knowledge store, to generate responses. Well-annotated datasets, paired with specific instructions that you provide, are key to creating a positive experience for end users.
File uploads
Important
This feature is in Public Preview.
File uploads allow users to blend their local CSV and Excel files with Unity Catalog data to answer questions. To enable file uploads, contact your Databricks account team. For more information, see Upload a file.
How Genie works
Genie uses a compound AI system to interpret business questions and generate answers. Instead of using a single large language model, compound AI systems process tasks in AI applications by combining multiple interacting components. Compound AI systems are an increasingly common design pattern for AI applications because of their performance and flexibility. For more information, see The Shift from Models to Compound AI Systems.
What is Genie's knowledge store?
Genie authors can add company- and space-specific metadata directly to data assets in a Genie space. This includes table and column metadata descriptions, column-level synonyms, sampled values, and value dictionaries, which Genie consults when generating answers. A detailed metadata layer helps Genie retrieve the correct information and produce more accurate results.
How does Genie generate a response?
When a user submits a question, Genie parses the request, identifies relevant data sources, and determines how to respond to the prompt. Details that authors provide, combined with Unity Catalog metadata, allow Genie to infer both business and technical logic. Genie intelligently filters example SQL queries, table and column metadata, and chat history to select the most relevant information for answering the request.
Genie uses the following components to generate responses:
- Unity Catalog table metadata: Includes table names, descriptions, and defined primary key (PK) and foreign key (FK) relationships. Genie uses this data as it parses the request and converts the natural language prompt to SQL.
- Column names and descriptions: Genie intelligently filters for relevant column names and descriptions to include.
- Knowledge store context: Authors can locally edit table metadata for assets used in a Genie space. This helps Genie generate more accurate responses and doesn't alter existing Unity Catalog metadata. See Set up and manage an AI/BI Genie space.
- Example SQL queries: Genie intelligently selects relevant SQL examples from SQL Queries.
- SQL functions: All SQL functions that have been added in the space.
- Instructions: The plain-text notes provided as General instructions are included as context.
- Prompt and responses history: Prompts and responses from the current chat are included as context. If necessary, because of set token limits, the oldest parts of the chat record are excluded.
Note
Some table details, such as the owner and table size, are not included by default. To access this information, use views from the information schema available for all Unity Catalog catalogs. Default views might include unnecessary details, so creating a custom view on top of that can help focus on the specific information you need. For more information about what's available in the information schema, see Information schema.
In many cases, Genie generates a SQL query that runs on the space's SQL warehouse. Generated queries are always read-only. Retries are handled automatically, and the SQL warehouse handles concurrency and scale. The result set is presented as part of the response.
Set up a Genie space
You can create a Genie space if you have:
- The Databricks SQL entitlement.
- At least CAN USE permission on a pro or serverless SQL warehouse.
- At least
SELECT
privilieges on one or more Unity Catalog data objects.
See Set up and manage an AI/BI Genie space.
Interact with a Genie space
Business teams are the end users for a Genie space. To use a Genie space, business users must have:
- The consumer access or Databricks SQL entitlement.
- At least CAN USE permission on the default warehouse designated for the Genie space.
- At least
SELECT
privileges on all of the Unity Catalog data objects used in the space.
Business users can help curate a space by testing it and providing feedback during development. To learn more about how business users can start working with a Genie space, see Use a Genie space to explore business data.
Trusted assets
Trusted assets convey an extra layer of assurance in the accuracy of a result to a space user. When the exact text of a parameterized example query or SQL function is used to generate a response, Genie marks the response as Trusted. See Use trusted assets in AI/BI Genie spaces to learn more about trusted assets. See Use parameters in SQL queries to learn more about working with parameterized queries.
Evaluate responses with benchmarks
Benchmarks allow you to scale up testing and evaluation of individual responses in a Genie space. Unlike instructions, benchmarks are meant to evaluate, not inform, your Genie space. Genie does not use benchmark questions or example SQL to improve Genie's context.
Using benchmarks, you can run a collection of test questions and use the responses to measure Genie's accuracy. Optionally, you can include a SQL statement that returns the expected results. When the benchmark question runs, Genie's response is compared to the results provided by the SQL statement and scored for accuracy. The question is marked for review if no SQL answer has been provided.
See Use benchmarks in a Genie space.
Privacy and security
Q: What model does Genie use?
Genie is a Databricks AI-powered feature. It uses a compound AI system that combines the use of AI models, retrieval, ranking, and personalization systems to understand your organization's data and usage patterns. To learn more, see Databricks AI-powered features.
Q: What data is being sent to the model?
Genie uses your prompt, relevant table metadata and values, errors, as well as input code or queries when generating a response.
To process responses, Genie uses the following:
- The natural language prompt submitted by the user
- Table names and descriptions
- Column titles, descriptions, and sample values
- General instructions
- Example SQL queries
- SQL functions
Q: Does Azure OpenAI store my data?
No. When using Azure OpenAI models through Databricks, Microsoft does not store prompts or responses for any period of time at any level, not even in network logs. This includes data that would normally be used for abuse monitoring. Databricks has opted out of Azure OpenAI's abuse monitoring and human review, so Microsoft does not retain or inspect any data sent by Genie. For more information, see Microsoft's documentation.
Q: Where are Genie responses stored?
Genie responses are stored in the Azure Databricks control plane.
Q: Is row-level filtering supported in a Genie space?
Yes, privileges granted in Unity Catalog control which users can access specific data objects. If row filters or column masks are applied to a data object, they control which values are returned in the result set. See Row filters and column masks.
Q: How is my traffic routed through Geos?
Genie is deployed in the US, EU, AUS, and India.
Traffic routing depends on your region and whether cross-Geo processing is enabled (Enforce data processing within workspace Geography for Designated Services is Disabled):
- EU: Traffic always routes through the EU, regardless of cross-Geo processing.
- US: Traffic always routes through the US, regardless of cross-Geo processing.
- India and AUS:
- If cross-Geo processing is disabled: Traffic is guaranteed to stay within your region.
- If cross-Geo processing is enabled: Traffic always routes through the US.
- All other regions:
- If cross-Geo processing is disabled: Genie will not work.
- If cross-Geo processing is enabled: Traffic routes through the US.