Clean, transform, and load data

Completed

When you use Power Query to ingest data into Customer Insights - Data, you can use the data transformation capabilities available to shape your data as needed for the best experience in the application. After you select the Power Query connector, identify the data source to use, and connect to it, you can start shaping your data. Select the Transform data option at the bottom of the screen to open the Power Query designer.

Screenshot of the transform data button.

Important

The Power Query Editor ribbon contains many buttons you can use to select, view, and shape your data. To learn more about the available features and functions, see the query ribbon.

Identify column headers and names

The first step in shaping your initial data is to identify the column headers and names within the data, and then evaluate where they're located to ensure they are in the right place. When the data is brought in, Power Query Editor assumes that all data belongs in table rows. However, a data source might have a first row that contains column names. To correct this inaccuracy, you need to promote the first table row into column headers.

You can promote headers in two ways:

  • Select the Use First Row as Headers option on the home tab.

  • Select the drop-down button next to Column1, then select Use First Row as Headers.

Screenshot of the options to promote headers.

Rename columns

As your data comes in, you might find that one or more columns have the wrong name, contain a spelling error, or just aren't user-friendly. In those instances, you can rename columns as needed.  

Columns can be renamed in two ways:

  • Right-click the column header and select Rename.

  • Double-click the column header and overwrite the name with the correct name.

Remove columns

A key step in the data shaping process is to remove unnecessary columns. Removing columns at an early stage in the process rather than later is best, especially when you establish relationships between your tables.  

You can remove columns in two ways:

  • Select the columns that you want to remove.

    • On the Home tab, select Remove Columns.
  • Select the columns that you want to keep.

    • On the Home tab, select Remove Columns and then choose Remove Other Columns.

Screenshot of the options to remove columns.

Change the column data type

The types of data that are stored in your columns can affect different aspects of what you can do with the data in Customer Insights - Data. For example, if you're creating a segment that identifies customers who are within a specific age range, it's important to format the date of birth column as a date. If it isn't, the system can't use data expressions to check the age of the customer. This is also important when you're planning to perform calculations, such as identifying how much someone might spend.

It isn't uncommon for the data to initially have the wrong data type associated with it. For example, when you connect to a CSV/Text file, all the loaded columns are likely all assigned a text data type. If we don't ensure that each column has the right data type, we could run into issues down the road.

Consider the following example. The table represents an example of a transaction table that contains data, which represents purchases made on a company website. It contains the ID of the customer that made the purchase, and the purchase date and total price of the purchase.

Customer ID Purchase date Total price
1005 11/1/2023 259.95
1006 11/5/2023 350.00
1007 11/10/2023 425.99

To ensure that we can use the data from this table appropriately, we need to assign the correct data type to the appropriate fields.

In this example, we might set the data types for the fields as noted here:

  • Customer ID - Since this is a text-based value, we could leave this as a text field.

  • Purchase Date - This is likely used to represent time windows so we would want this field to be set to some type of date-related data type.

    • Date - Stores just the date that the transaction took place.

    • Date/Time - Stores both the date and time of the purchase.

    • Date/Time/Zone - Stores date, time, and time zone details.

      In our example, it can be set to date since we only need to store it as a date and no time-related data is necessary.

  • Total Price - This would likely be set as Currency to represent a dollar amount.

Data types can be modified in two ways.

The first way is to select the column that has the issue associated with it. On the Home ribbon, select the Data Type drop-down in the Transform tab, and then select the correct data type from the list.

Screenshot of the data type list.

The second way to modify data types is to select the data type icon next to the column header, and then select the correct data type from the list.

Screenshot of the data type icon with menu displayed.

One of the advantages of Power Query is that any changes that you make to the data are stored as what is referred to as applied steps. Each time you connect to the query, the changes you made during transformation are applied. This ensures your query is the same every time. Any other changes that you make to items, like column data types, are saved as a programmed step. If you ever find you need to remove an applied step, you can select the Delete option next to the step in the applied steps box.

Screenshot of the x to remove an applied step.

Rename a query

Initially when you create your query, it likely has a generic name. It's good practice to change uncommon or unhelpful query names to names that are more obvious, or that the user is more familiar with. For instance, if you're importing purchases from a Point-of-sale system, you might want to name the query something like POS Purchases.

Screenshot of the name field used to rename a query.

Once you shape the data you want, you can select the Next button to save the query and begin the ingesting process.