Edit Q&A linguistic schema and add phrasings in Power BI Desktop

It's powerful to use common phrases and natural language to ask questions of your data. It's even more powerful when your data answers. When you ask Power BI Q&A a question, it makes a best effort to answer correctly. You can edit the linguistic schema to improve the Q&A answers for even better interactions.

It all starts with your enterprise data. The better the data model, the easier it will be for users to get quality answers. One way to improve the model is to add a linguistic schema that defines and categorizes terminology and relationships between table and column names in your dataset. Power BI Desktop is where you manage your linguistic schemas.

There are two sides to Q&A. The first side is the preparation, or modeling. The second side is asking questions and exploring the data, or consuming. In some companies, employees known as data modelers or IT admins might be the ones to assemble the datasets, create the data models, and publish the datasets to Power BI. A different set of employees would be the ones to "consume" the data online. In other companies, these roles might be combined.

This article is for the data modelers, the people who optimize datasets to provide the best possible Q&A results.

What's a linguistic schema?

A linguistic schema describes terms and phrases that Q&A should understand for objects within a dataset, including parts of speech, synonyms, and phrasings. When you import or connect to a dataset, Power BI creates a linguistic schema based on the structure of the dataset. When you ask Q&A a question, it looks for matches and relationships in the data to figure out the intention of your question. For example, it looks for nouns, verbs, adjectives, phrasings, and other elements. And it looks for relationships, like which columns are objects of a verb.

You're probably familiar with parts of speech, but phrasings might be a new term. A phrasing is how you talk about (or phrase) the relationships between things. For example, to describe the relationship between customers and products, you might say “customers buy products”. Or to describe the relationship between customers and ages, you might say “ages indicate how old customers are”. Or to describe the relationship between customers and phone numbers, you might say “customers have phone numbers”.

These phrasings come in many shapes and sizes. Some correspond directly with relationships in the data model. Some relate columns with their containing tables. Others relate multiple tables and columns together in complex relationships. In all cases, they describe how things are related by using everyday terms.

Linguistic schemas are saved in a .yaml format. This format is related to the popular JSON format but provides a more flexible and easier-to-read syntax. Linguistic schemas can be edited, exported, and imported into Power BI Desktop.

Prerequisites

Set up an editor for .yaml files

We recommend using Visual Studio Code to edit linguistic schema .yaml files. Visual Studio Code includes out-of-the-box support for .yaml files and can be extended to specifically validate the Power BI linguistic schema format.

  1. Install Visual Studio Code.

  2. Right-click the .yaml file in the sample linguistic schema that you saved earlier: QnALinguisticSchema.zip.

  3. Select Open with > Choose another app.

  4. Select Visual Studio Code and then choose Always.

    Screenshot of the dialog that shows a list of programs to open the file. Visual Studio Code and Always are highlighted.

In Visual Studio Code, install the YAML Support by Red Hat extension.

  1. Select the Extensions tab (last one in the left menu) or CTRL+SHIFT+X.
    Screenshot of the left menu in Visual Studio Code with the Extensions icon highlighted.
  2. Search for "yaml" and select YAML Support by Red Hat in the list.
  3. Select Install > Reload.

Work with linguistic schemas

There are two ways to work with linguistic schemas. One way is to edit, import, and export the .yaml from the ribbon in Power BI Desktop. That way is covered in the Power BI Q&A Tooling experience article. You don't have to open the .yaml file to improve Q&A.

The other way to edit a linguistic schema is to export and edit the .yaml file directly. When you edit a linguistic schema .yaml file, you tag columns in the table as different grammatical elements and define words that a colleague might use to phrase a question. For instance, you state the columns that are the subject and the object of the verb. You add alternative words that colleagues can use to refer to tables, columns, and measures in your model.

Screenshot of a sample linguistic schema .yaml file.

Before you can edit a linguistic schema, you must open (export) it from Power BI Desktop. Saving the .yaml file back to the same location is considered importing. But you can also import other .yaml files instead. If for instance, you have a similar dataset and you've already put in work adding parts of speech, identifying relationships, creating phrasings, and creating synonyms, you can use that .yaml file in a different Power BI Desktop file.

Q&A uses all this information together with any enhancements that you make to provide a better answer, auto completion, and summary of the questions.

Edit a linguistic schema

When you first export your linguistic schema from Power BI Desktop, most or all of the content in the file is automatically generated by the Q&A engine. These generated entities, words (synonyms), relationships, and phrasings are designated with a State: Generated tag. They're included in the file mostly for informational purposes but can be a useful starting point for your own changes.

Note

The sample .yaml file included with this tutorial doesn't contain State: Generated or State: Deleted tags because it was prepared specifically for this tutorial. To see these tags, open an unedited .pbix file in Relationship view and export the linguistic schema.

Screenshot of a sample .yaml file showing State: Generated.

When you import your linguistic schema file back into Power BI Desktop, anything that's marked State: Generated is ignored and later regenerated. Thus, if you’d like to change some generated content, remove the corresponding State: Generated tag. Similarly, if you want to remove some generated content, change the State: Generated tag to State: Deleted so that it isn't regenerated when you import your linguistic schema file.

Export, then import a .yaml file

  1. In Power BI Desktop, open the dataset in Model view.

  2. On the Modeling tab, select Linguistic Schema > Export linguistic schema.

  3. Save it. The file name ends with .lsdl.yaml.

  4. Open it in Visual Code or another editor.

  5. In Model view in Power BI Desktop, on the Modeling tab, select Linguistic schema > Import.

  6. Go to the location where you saved the edited .yaml file and select it. A Success message lets you know that the linguistic schema .yaml file was successfully imported.

    Screenshot of the successful import dialog.

Phrasings in the linguistic schema

A phrasing is how you talk about (or phrase) the relationships between things. For example, to describe the relationship between customers and products, you might say “customers buy products”.

Where do phrasings come from?

Power BI adds many simple phrasings to the linguistic schema automatically based on the structure of the model and guesses based on the column names. For example:

  • Most columns relate to their containing table with a simple phrasing, like “products have descriptions”.
  • Model relationships result in default phrasings for both directions of the relationship, like “orders have products” and “products have orders”.
  • Based on their column names, some model relationships can have more complex default phrasing, like “orders are shipped to cities”.

However, your users sometimes talk about things in ways that Q&A can’t guess. Therefore, you might want to add your own phrasings manually.

Why add phrasings?

The first reason for adding a phrasing is to define a new term. For example, if you want to be able to ask “list the oldest customers”, you must first teach Q&A what you mean by “old”. You would do so by adding a phrasing like “ages indicate how old customers are”.

The second reason for adding a phrasing is to resolve ambiguity. Basic keyword search only goes so far when words have more than one meaning. For example, “flights to Chicago” isn't the same as “flights from Chicago”. But Q&A won’t know which one you mean unless you add the phrasings “flights are from departure cities” and “flights are to arrival cities”. Similarly, Q&A will only understand the distinction between “cars that John sold to Mary” and “cars that John bought from Mary” after you add the phrasings “customers buy cars from employees” and “employees sell customers cars.”

The final reason for adding a phrasing is to improve restatements. Rather than Q&A echoing back to you “Show the customers and their products”, it would be clearer if it said “Show the customers and the products they bought” or “Show the customers and the products they reviewed”, depending on how it understood the question. Adding custom phrasings allows restatements to be more explicit and unambiguous.

Kinds of phrasings

To understand the different types of phrasings, you’re first going to need to remember a couple of basic grammar terms:

  • A noun is a person, place, or thing. Examples: car, teenager, Marty, flux capacitor
  • A verb is an action or state of being. Examples: hatch, burst, devour, eject
  • An adjective is a descriptive word that modifies a noun. Examples: powerful, magical, golden, stolen
  • A preposition is a word used before a noun to relate it to a previous noun, verb, or adjective Examples: of, for, near, from
  • An attribute is a quality or feature of something.
  • A name is a word or set of words by which a person, animal, place, or thing is known or referred to.

Attribute phrasings

Attribute phrasings are the workhorse of Q&A. They're used when one thing is acting as an attribute of another thing. They’re simple, straightforward, and perform most of the heavy lifting when you haven't defined a subtler, more detailed phrasing. Attribute phrasings are described using the basic verb “have” (“products have categories” and "host countries/regions have host cities"). They also automatically allow questions with the prepositions “of” and “for” (“categories of products” or “orders for products”) and possessive (“John’s orders”). Attribute phrasings are used in these kinds of questions:

  • Which customers have orders?
  • List host cities by country/region ascending
  • Show orders that have chai
  • List customers with orders
  • What is the category of each product?
  • Count Robert King's orders

Power BI generates most of the attribute phrasings needed in your model based on table or column containment and model relationships. Typically, you don’t need to create them yourself. Here's an example of how an attribute phrasing looks inside the linguistic schema:

product_has_category:
  Binding: {Table: Products}
  Phrasings:
  - Attribute: {Subject: product, Object: product.category}

Name Phrasings

Name phrasings are helpful if your data model has a table that contains named objects, such as athlete names or customer names. For example, a “product names are names of products” phrasing is essential for being able to use product names in questions. Name phrasing also enables “named” as a verb (for example, “List customers named John Smith”). However, it's most important when used in combination with other phrasings. It allows a name value to be used to refer to a particular table row. For example, in “Customers that bought chai”, Q&A can tell the value “chai” refers to the whole row of the product table rather than just a value in the product name column. Name phrasings are used in these kinds of questions:

  • Which employees are named Robert King
  • Who is named Ernst Handel
  • Fernand De Montigny's sports
  • Count of athletes named Mary
  • What did Robert King buy?

Assuming you used a sensible naming convention for name columns in your model (for example, “Name” or “ProductName” rather than “PrdNm”), Power BI generates most of the name phrasings needed in your model automatically. You usually don’t need to create them yourself.

Here's an example of how a name phrasing looks inside of the linguistic schema:

employee_has_name:
  Binding: {Table: Employees}
  Phrasings:
  - Name:
      Subject: employee
      Name: employee.name

Adjective phrasings

Adjective phrasings define new adjectives used to describe things in your model. For example, “happy customers are customers where rating > 6” phrasing is needed to ask questions like “list the happy customers in Des Moines.” There are several forms of adjective phrasings to use in different situations.

Simple adjective phrasings define a new adjective based on a condition, such as “discontinued products are products where status = D.” Simple adjective phrasings are used in these kinds of questions:

  • Which products are discontinued?
  • List the discontinued products
  • List the gold medalists
  • Products that are backordered

Here's an example of how a simple adjective phrasing looks inside of the linguistic schema:

product_is_discontinued:

Binding: {Table: Products}
  Conditions:
  - Target: product.discontinued
    Operator: Equals
    Value: true
  Phrasings:
  - Adjective:
      Subject: product
      Adjectives: [discontinued]

Measurement adjective phrasings define a new adjective based on a numeric value that indicates the extent to which the adjective applies, such as “lengths indicate how long rivers are” and "small country/regions have small land areas." Measurement adjective phrasings are used in these kinds of questions:

  • List the long rivers
  • Which rivers are the longest?
  • List the smallest country/regions that won gold in basketball
  • How long is the Rio Grande?

Here's an example of how a measurement adjective phrasing looks inside of the linguistic schema:

river_has_length:

Binding: {Table: Rivers}
 Phrasings:
 - Adjective:
     Subject: river
     Adjectives: [long]
     Antonyms: [short]
     Measurement: river.length

Dynamic adjective phrasings define a set of new adjectives based on values in a column in the model, such as “colors describe products” and "events have event genders." Dynamic adjective phrasings are used in these kinds of questions:

  • List the red products
  • Which products are green?
  • Show skating events for females
  • Count issues that are active

Here's an example of how a dynamic adjective phrasing looks inside the linguistic schema:

product_has_color:

Binding: {Table: Products}
  Phrasings:
  - DynamicAdjective:
      Subject: product
      Adjective: product.color

Noun phrasings

Noun phrasings define new nouns that describe subsets of things in your model. They often include some type of model-specific measurement or condition. For example, for our model we might want to add phrasings that distinguish champions from medalists, land sports from water sports, teams versus individuals, or age categories of athletes (teens, adults, seniors). For our movie database, we might want to add noun phrasings for “flops are movies where net profit < 0” so that we can ask questions like “count the flops by year.” There are two forms of noun phrasings to use in different situations.

Simple noun phrasings define a new noun based on a condition, such as “contractors are employees where full time = false” and "champion is athlete where count of medals >5." Simple noun phrasings are used in these kinds of questions:

  • Which employees are contractors?
  • Count the contractors in Portland
  • How many champions in 2016

Here's an example of how a simple noun phrasing looks inside of the linguistic schema:

employee_is_contractor:

Binding: {Table: Employees}
  Conditions:
  - Target: employee.full_time
    Operator: Equals
    Value: false
  Phrasings:
  - Noun:
      Subject: employee
      Nouns: [contractor]

Dynamic noun phrasings define a set of new nouns based on values in a column in the model, such as “jobs define subsets of employees.” Dynamic noun phrasings are used in these kinds of questions:

  • List the cashiers in Chicago
  • Which employees are baristas?
  • List the referees in 1992

Here's an example of how a dynamic noun phrasing looks inside of the linguistic schema:
employee_has_job:

Binding: {Table: Employees}
 Phrasings:
 - DynamicNoun:
     Subject: employee
     Noun: employee.job

Preposition phrasings

Preposition phrasings are used to describe how things in your model are related via prepositions. For example, a “cities are in countries/regions” phrasing improves understanding of questions like “count the cities in Washington.” Some preposition phrasings are created automatically when a column is recognized as a geographical entity. Preposition phrasings are used in these kinds of questions:

  • Count the customers in New York
  • List the books about linguistics
  • Which city is Robert King in?
  • How many books are by Stephen Pinker?

Here's an example of how a preposition phrasing looks inside of the linguistic schema:
customers_are_in_cities:

Binding: {Table: Customers}
 Phrasings:
 - Preposition:
     Subject: customer
     Prepositions: [in]
     Object: customer.city

Verb phrasings

Verb phrasings are used to describe how things in your model are related via verbs. For example, a “customers buy products” phrasing improves understanding of questions like “who bought cheese?” and “what did John buy?” Verb phrasings are the most flexible of all of the types of phrasings, often relating more than two things to each other, such as “employees sell customers products.” Verb phrasings are used in these kinds of questions:

  • Who sold what to whom?
  • Which employee sold chai to John?
  • How many customers were sold chai by Mary?
  • List the products that Mary sold to John.
  • Which discontinued products were sold to Chicago customers by Boston employees?

Verb phrasings can also contain prepositional phrases, thereby adding to their flexibility, such as “athletes win medals at competitions” or “customers are given refunds for products.” Verb phrasings with prepositional phrases are used in these kinds of questions:

  • How many athletes won a gold medal at the Visa Championships?
  • Which customers were given a refund for cheese?
  • At which competition did Danell Leyva win a bronze medal?

Some verb phrasings are created automatically when a column is recognized as containing both a verb and a preposition.

Here's an example of how a verb phrasing looks inside of the linguistic schema:
customers_buy_products_from_salespeople:

Binding: {Table: Orders}
  Phrasings:
  - Verb:
      Subject: customer
      Verbs: [buy, purchase]
      Object: product
      PrepositionalPhrases:
      - Prepositions: [from]
        Object: salesperson

Relationships with multiple phrasings

Frequently, a single relationship can be described in more than one way. In this case, a single relationship can have more than one phrasing. It's common for the relationship between a table entity and a column entity to have both an attribute phrasing and another phrasing. For example, in the relationship between customer and customer name, you'll want both an attribute phrasing (for example, “customers have names”) and a name phrasing (for example, “customer names are the names of customers”), so you can ask both types of questions.

Here's an example of how a relationship with two phrasings looks inside of the linguistic schema:
customer_has_name:

Binding: {Table: Customers}
Phrasings:
  - Attribute: {Subject: customer, Object: customer.name}
  - Name:
      Subject: customer
      Object: customer.name

Another example would be adding the alternate phrasing “employees sell customers products” to the “customers buy products from employees” relationship. You don't need to add variations like “employees sell products to customers” or “products are sold to customers by employees” because the “by” and “to” variations of the subject and indirect object are inferred automatically by Q&A.

Considerations and troubleshooting

If you make a change to a .lsdl.yaml file that doesn't conform to the linguistic schema format, validation squiggles indicate the issue:

Screenshot of a sample .yaml file showing errors.

More questions? Ask the Power BI Community