Effect of dataset changes while using deployment pipelines in Power BI

This week, I’ve presented at two conferences in Europe about Application Lifecycle Management for Power BI. During these sessions at DataMinds Connect and Techorama I’ve talked and demoed native Power BI deployment pipelines, as well as Azure DevOps integration.

One of the topics discussed during the session, is the effect of deployments on datasets by using native deployment pipelines in the Power BI service. Deployment Pipelines only deploy meta data from the data model, however specific changes might have an unwanted effect on the data in the dataset in the target stage.

In this blog post, I will further elaborate on several specific use cases and the effect on your dataset in the target stage.

This entire blog post will focus on Power BI Deployment Pipelines. If you are not familiar with Deployment Pipelines yet, I encourage you to read my earlier post on Why you should care about deployment pipelines first.

Purge dataset

Imagine you have a dataset in Development and you deploy this dataset to the Test stage. By doing so, you only move the meta data from the dataset to the next stage. The data itself will not be copied over different stages. In my opinion a good thing, as the dataset might bind to a different data source as well. Specific changes in the structure of your model might lead to issues while deploying.

The easiest example is a postal code. Imagine one of your columns contains the numbers and letters of a postal code, where you also formatted the column to be a text. This configuration is available on both development and test stage at this moment. Sample table shown below.

For various reasons like dataset optimization or GDPR compliancy purposes, you decide to cut off the letters and only keep the numbers to identify regions. With that, you also change the datatype from text to number like shown below.

During the deployment, the new meta data will be moved to the test stage as well, which makes the datatype change to number.

As only the new meta data is deployed, but the data remains unchanged, there will occur an error during deployment showing that the data in the target item does not match the new meta data and asks if you want to proceed. If you click yes, all data from the target item table will be dropped.

In case you’re using the Power BI REST API to automate your deployment, the approval to drop data from the target artifact will be controllable using the allowPurgeData property which you can specify in your API call body.

So far, understandable behavior. To get the data back in the dropped table, you have to perform a refresh of this table.

Adding or removing columns from a table

During the session at Techorama, I had someone in the audience asking what happens in columns to a table are deleted or added. At that point, I was not sure. So, I went for it and tried it!

Adding a column

In the first example, we have a supplier table in our dataset. This contains some descriptive information about our suppliers. I’ve published the dataset containing this table to the Development and Test stage and both are identical.

When we add more data to the table, for example the primary contact and payment days, which are two additional columns to the model.

This change will be directly visible in Development when I published the file to the workspace. When I move the change forward to the Test workspace, only the meta data is moved. So, the original data that was already in the test environment will still be shown in the report till the next refresh.

What happens as a result, is that the new columns will show in the model, because the meta data is already there. However, there is no data and therefore the columns will remain empty.

When it is a single visual displaying only the new data, like in above example, there is no big deal. However, when you’ve released both the dataset and the report to the Test stage, where already existing data is combined with one of the new columns in a single visual, this entire visual will not show any data.
To show the data in this visual, you have to select to show items with no data before the data will show again for the first four columns, but this is not always wanted as it changes the behavior for the entire visual. The new two columns will still remain empty till the next refresh succeeded.

Removing a column

Let’s continue with the tests and see what happens when we remove a column from the model. To test this, we will simply remove the WWISupplierID column from the dataset. To be sure the empty columns from the previous test have no effect on this test, I’ve left them out of the visualization and disabled the show items with no data again.

Our table in the Development stage now looks like this:

For this test, we first only deploy the dataset to the Test stage and do not yet deploy the report.

Originally, we had the WWISupplierID column in our visualization as well. As we only deployed the dataset at this point, we see that the visual in our report now breaks. The meta data of our dataset in Test is updated after deployment, which resulted in the fact that the column cannot be found.

To see if the data is actually removed, or that it is still available somewhere on the backend, I will redeploy the old model back now where the column WWISupplierID is available again. As we follow the process, I will publish it to the Development stage and move only the dataset to Test afterwards using the pipeline.

As soon as I’ve published the column WWISuuplierID back, the visual shows empty, similar behavior as we have seen with the first test case. With this, we can conclude that the data is actually dropped when we initially deployed the new meta data where the column was removed.

What did we learn?

Let’s wrap up the tests in short. Basically, when new columns are added to the model, this doesn’t harm anything directly. As soon as also the report is published which makes use of those newly added columns, there might be a side effect on some visuals where data from the still empty column and already existing columns are combined.

Removing columns will actually drop the data from the model directly. Visuals using this column will return an error message referring to the missing column. Especially when multiple users use your data model for self-service purposes, it is important to think about the potential side effects on their reports before you remove columns from the model.

If you consider removing columns but you’re not sure if those columns are used in the model, I strongly recommend reading the recent post from Chris Webb: Finding The Tables, Columns And Measures Used By A DAX Query In Power BI. In this post Chriss explains how you can use the output from Log Analytics integration to get an overview of the tables and columns used in your model to make a well-considered decision before you remove a column.

Whenever you removed the column and found out afterwards that the column was used in a report, the quick solution is to bring the column back in the model. As mentioned before, the data is gone but after a refresh everything works like before again.

One thought on “Effect of dataset changes while using deployment pipelines in Power BI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s