Another week, another challenge! Summer holidays is the perfect time to run into challenges that you normally would not encounter or simply do not know about because they are fixed before you know. This blog will elaborate on a case I had at a client, where we run into issues with the on-premises data gateway in Power BI and in particular related to building dataflows based on sources that connect via the gateway.
Let’s start simple, your gateway administrator helped you to spin up a gateway that allows you to connect to your data sources which are hosted on-premises, in a different cloud or maybe in Azure behind a VNET or with private endpoints. All valid cases to have a data gateway setup in your organization. In order to build your Power BI solution, you decide to setup some dataflows to pull the data from source into Power BI, so you can easily reuse those tables in different datasets. So far, so good! But as we spin up a dataflow in the Power BI service, there are a few challenges we might encounter while setting up.
In my scenario, I have setup a data gateway which connects to a SQL endpoint and configured my Demo User as user on the gateway. Meaning, all dataflows and datasets that will use this gateway connection, should be using these demo user credentials configured on the gateway for everything they refresh.
As soon as I start building a dataflow, I will enter the server address as well as the database name. Based on my permission on the gateway, you will directly see that the gateway is found and provides me the credentials. However, notice the Sign in button on the bottom!
A typical user would expect that he needs to sign in to the source and clicks this Sign in button. Especially as Power BI says “You are not signed in. Please sign in”. But he should actually NOT do this! On the right bottom of the screen, the Next button is already shown in yellow to start building the dataflow. Let’s have a look at two different scenarios below.
Building a dataflow using the gateway credentials
Let’s first build a dataflow without clicking the sign in button. By simply clicking next, we can start building our dataflow. In this example we will use the sales_model.City table and start transforming our data. We click Save & close on the bottom, give the dataflow a name and hit the refresh button.
Looking at the query trace of our source system, we can see the query is incoming using the credentials configured on the gateway, completely in line with our expectations.
When clicking the sign in button while building a dataflow
Another scenario, is that your user does actually click the Sign in button when the entered the data source address and database name. In my example the user will be prompted to sign in to the source using OAuth, but this can differ based on the source you use. If you use basic authentication, you will just see a username and password box to be filled in. In this example, I signed in with my personal user.
Now, I build a dataflow based on the sales_model.Customer table, intentionally a different table to easily identify it in the query trace later on. Again, I click transform and finally Save & close, give the dataflow a name and hit the refresh button again. Just like the previous scenario, I finally hit the refresh button.
Let’s have a look at the query trace again. We now see all incoming queries are based on my personal user, as that is the one which we signed-in with. We can identify this by the most right column in below picture, which shows us the submitter.
What is the problem?
So… what exactly is the problem here? Cause we have two dataflows and you can enter different credentials, so what? Well… what actually happened when you changed the credentials while you are building a dataflow, is that you updated the credentials on the gateway configuration. Cause every following refresh that runs over the same gateway connection, will now also use my personal credentials that I entered building the second dataflow. Meaning, my personal user is now used/abused for all following refreshes!
To prove this case, let me just refresh the very first dataflow again, based on sales_model.City data. In below query trace, we can see that the incoming query is now also using my personal user as submitter on the right-hand side.
I don’t think I have to explain in detail what could go wrong when all refreshes use my credentials. But for completeness, let me list below a two very likely challenges you might run into:
- I’m on vacation and my credentials expire. As a result the gateway connection will be down.
- Many users are using this gateway connection, but my user (which is now used on the gateway) does not have sufficient permissions to all tables. As a result some dataflows and/or datasets will fail to refresh.
How could this have happened?
Finally, this whole case comes down to permission management on the gateway. Cause this only happens if the user clicking the Sign in button while building a dataflow is also administrator on the gateway. On-premises data gateways have three different levels of permissions you can set as shown in below image.
So, please be very aware of whom you grant Owner permissions on the gateway and instruct them to never ever hit the Sign in button while building a dataflow!
As a final note and point of feedback, it would have been nice if the Power BI Service gave you a warning, like “Are you sure you want to update the credentials? Applying this step, will also update the gateway connection credentials.”. Now, it is one of the best hidden undocumented features if you ask me. Also, the message “You are not signed in. Please sign in” which a user is seeing while building a dataflow is completely misleading if you ask me.
Once again, make sure you manage the permissions on the gateway properly! Maybe even consider using a specific user as gateway owner, so the likelihood this happens to you is less.