In the Power BI service, you can easily look at refresh times for an individual dataset or dataflow. There are many different reasons why these metrics are important to you as a dataset or dataflow owner. For example, you may bump into refresh time-outs or unfortunate errors. There are many good reasons to think about why you want to have more insights in your refresh metrics.
Having that said, it can be a pain to look at these metrics every day. Power BI already offers a way to send automatic notification in case of a refresh failure. Though, I would personally prefer to have more insights in all my refresh metrics, whether they are failing or succeeding.
In this blog I want to share a way how you can export all refresh metrics for your datasets and dataflows using a PowerShell script.
Why do we want these metrics?
For every dataset and dataflow owner, it is important to know how your artifacts are performing. They may run perfectly fine in the beginning but might run into time-outs or unfortunate errors over time. This can be caused by many different reasons, such as increasing data volume, changes on the data source, failures in Power BI and many other reasons.
Recently I bumped into another use case, which made me decide to explore different ways to get insights. At one of the clients I work for, we upgraded a premium capacity to the new gen2. Upgrading gave us tons of benefits in terms of performance and load times, which made us cheerful. On the other hand, it became more difficult for us to manage the premium capacity, as the capacity metrics are not available for Power BI Premium gen2, as long as it is in public preview. (it is in preview at the moment of writing this blog).
You may think, there should be easier ways to get refresh metrics, right? Well, yes! But this requires admin permissions on a premium capacity for the specific use case described above. In the premium capacity admin settings within the Power BI Service, you will find a tab Refresh summary.
You might think, this is exactly what we need, and I can even export it with the button on top. And I have to say, that was my first thought as well. Unfortunately, this did not fulfill in my needs at this moment. At this moment, this refresh summary only includes datasets and not dataflows. As the position of dataflow in Power BI solution architecture is strongly increasing, it is important to also have insights in these metrics. Let’s hope that the Refresh Summary will be enriched with dataflow metrics in the future!
Other than that, since premium capacity admin permissions are required to get access to the refresh summary, this might be blocking you as well in case you are not the owner of the capacity.
Use PowerShell to collect the refresh metrics
By now, it is no surprise that I am a fan of the Power BI REST API. I use it a lot, especially in combination with PowerShell. So, I did to extract the refresh history for all dataflows and datasets in my workspace. The good thing is, with sufficient workspace permissions, you will be able to read the refresh history. having that said, you will also be able to leverage the script to extract the refresh history. First, below a short summary of all that I used to read all this information for the Power BI Service.
- Power BI Cmdlets, I leverage the PowerShell reference for Power BI to execute tasks that relate to the Power BI Service. In specific, I leverage these cmdlets to authenticate to the service, as well as invoking Power BI REST API calls.
- Power BI REST API – Get datasets in group. I leverage this Power BI REST API to list down all the datasets that exist in the specified workspace. After listing the datasets, this is dumped in a json file that summarizes the metadata of the datasets, including Id, name and configuration details.
- Power BI REST API – Get refresh history in group. This API is used to capture all the refresh details of the earlier listed datasets. The response of the API can include multiple json arrays per dataset, containing status, starttime, endtime, refreshtype and request id. All different json arrays per dataset are exported as a separate json file.
- Power BI REST API – Get dataflows. With this API I list down all the dataflows in the specified workspace. Similar to the above action for datasets, the resulting list includes metadata about the dataflows and will be exported as separate json file to the defined output location.
- Power BI REST API – Get dataflow transactions. Transactions in dataflows summarizes the refresh history of the dataflows. The returned array is equal to the response for datasets. All different refresh transactions will be included in the final json file.
Running the script, results in four json files. Each file includes the dump date and time + the workspace id in the filename. With that, you can easily collect multiple days of refresh history for the same workspace, without overwriting history. The script reads the maximum history that you can see in the Power BI service as well, therefore it might be interesting to run the script frequently for long-term analysis.
For now, I will leave it up to you to decide what to do with it. Just for inspiration, you can build a new Power BI report on top for runtime analysis, failures and much more!
To avoid pasting an almost 150 line script in the blog, I hereby refer to my Power BI Automation GitHub repository where you can find the script PowerBIRefreshHistoryPerWorkspace.ps1. To understand what is happening in the script, I have added a bunch of comments in the code that explains each step.
<< THIS IS JUST A CODE SNIPPET, CHECK GITHUB FOR THE FULL CODE >> # List all dataflows in specified workspace Write-Host "Collecting dataflow metadata..." $GetDataflowsApiCall = $PbiRestApi + "groups/" + $WorkspaceId + "/dataflows" $AllDataflows = Invoke-PowerBIRestMethod -Method GET -Url $GetDataflowsApiCall | ConvertFrom-Json $ListAllDataflows = $AllDataflows.value # Write dataflow metadata json $DataflowMetadataOutputLocation = $DefaultFilePath + "DataflowMetadata.json" $ListAllDataflows | ConvertTo-Json | Out-File $DataflowMetadataOutputLocation -ErrorAction Stop Write-Host "Dataflow metadata saved on defined location" -ForegroundColor Green << THIS IS JUST A CODE SNIPPET, CHECK GITHUB FOR THE FULL CODE >>
Finally, a small note about the script and setup used. The script does an API call for each dataflow and each dataset to get the refresh metrics. Scalability of this script can be at risk for workspaces with large amounts of artifacts or running for multiple workspaces in sequence. Please keep the API limitation in mind!