Dynamically expanding and appending tables with Power Query

It has been a month since my last blog post. So, it was about time for new content! Since last month I’m busy preparing for the Power Platform Summit at the end of March in Amsterdam, so not much time left. Find more about the summit on my events page. So far so good, now time for some blog content.

In this blog, I will tell you more about dynamically expanding and appending tables with Power Query. I was triggered by this topic by Matthew Roche on Twitter. Thanks for that! In this blog I’ll describe how you can get all column names of all tables dynamically before you append all tables. With this, your table will automatically add new columns when the structure of your file changes or extra columns are added to your source.
Matthew Roche_Twitter_Table.Expand1

After some debugging Matthew came up with the following conclusion:

The problem is that the column used for the song title was named inconsistently in some files produced by the web crawler. The rest of the columns were named consistently, but not this one…

The files whose columns matched the root query definition had their song data returned properly. The others had all song titles missing.

In this case, it wasn’t clear that there was another column in the dataset which included all missing data. By dynamically getting the column names before expanding, you can see the other column popping-up, which is not the result what you expected.

Our case

To get things a little bit more specific, let’s use the following case:
I’ve a SharePoint folder including multiple Excel files, which I want to combine, since they all have the same structure. In that case you don’t want to create separate queries for each file but want to append those files dynamically. To do this, you have to connect to the SharePoint folder and afterwards combine all files. By clicking the Combine button, Power Query will generate a function which runs over all these files, generate tables and appends those tables afterwards.

2019-03-12 20_59_56-Untitled - Power Query Editor

Power Query asks you to pick an example file for generating the Power Query function. Automatically this will be the first found file in the SharePoint folder. Looking at the code generated in the function, this will include all column names to expand hard-coded! But what in case one of our files include extra columns than what is available in our example file? These column names are not included in the generated code, since that is based on only the first file. Conclusion: They will not be expanded and will never show up!

Get your column names dynamically

2019-03-12 21_13_08-Agenda - Marc.Lelijveld@macaw.nl - OutlookIn my case, I have three files which are almost the same. They all have a Date, Category, Amount and Price column. But in the third file, I’ve added Brand as well. Since my query is based on the first file, this column will not show up. With generating the Function, Power Query automatically appended all tables in the folder according to the generated function. With that, it also generated the steps as shown in the screenshot.

Once we open the advanced editor, we see the following line of code generated. The #”Sample File” tells us that it is getting all column names from the Sample file to expand. In our case the Sample file, was the first file.

#”Expanded Table Column1″ = Table.ExpandTableColumn( #”Removed Other Columns1″, “Transform File from SharePoint Folder” , Table.ColumnNames( #”Transform File from SharePoint Folder”(#”Sample File”)))
To see the complete Power Query code, scroll down all the way to the bottom of this blog.

So, we want to dynamically expand our table based on all found columns in all tables. To make this happen, we jump a few steps back in our applied steps to the Removed Other Columns1 step. When we click the expand button and load all column names, we already see the Brand column popping-up. But still this will be a hard-coded result. We want to get all column names dynamically. To do that, we are going to add another step in between here.

  1. Click Add Column in the ribbon
  2. Click Custom Column, name this column “GetColumnNames”
  3. Add the following code:
    Table.ColumnNames([Transform File from SharePoint Folder])
  4. As a result, you will get another column with a list of column names per file.
    2019-03-13-12_11_56-demo-file-power-query-editor.png
  5. Now we want to get a unique list of all these column names. So, delete all other columns, so we only have GetColumnNames left.
  6. Expand the GetColumnNames column to new rows.
  7. Remove duplicates from this table. Now we see all five unique column names dynamically generated.

Since the Table.Expand function only accepts lists as an argument, we need to convert this table to a list. We’ll do this later on in the expand step.

Now, we’ve added a few steps in between. At the moment we have an Expression Error, since our Expand table Column1 step refers to the last added step before, which is Remove Duplicates. We want this step to use the result of our table before we added all steps in between. Power Query always first calls the result of the step before, so we can easily change this to the result of a few steps before. Referring to a previous step always starts with a hashtag. Now we only have two steps left.

  1. To refer our Expand Table Column1 to the result before getting the column names, we are going to change the #”Remove Duplicates” to #”Removed Other Columns1″. 
  2. Now our query still refers to the #”Sample File” to get the columns to expand. We’re going to change this in to a list of the column names which we generated in the steps before. To make this happen, we only have to replace Table.ColumnNames(#”Transform File from SharePoint Folder”(#”Sample File”))
    for #”Removed Duplicates”[GetColumnNames].

2019-03-13 12_18_55-Demo file - Power Query Editor.png

Now, we see the Brand column popping-up as well! It will remain empty for the files where we didn’t have this column. But at least it is available now!
To see the complete Power Query code, scroll down all the way to the bottom of this blog. 

By directly calling the column name, which is GetColumnNames in this example, after the step with the previously defined result, it automatically formats as a list. So #”Removed Duplicates” is the previous step we’re referring to.

Wrapping things up

I’m aware of the fact that the last step of Power Query which we replaced, does something similar based on the Sample File. But since our sample file can only be one file, which might not include all columns, this is a more sustainable solution. Even when there are new columns added in the future, they will automatically appear in your table.

Besides that, you could have done the same, in less Power Query steps. But I’ve done this in extra steps consciously. With that I keep everything organized and understandable in case I need to transfer the code to someone else.


Complete Power Query code

The complete Power Query code is shown below. For re-use, you only must be aware of a few things:

  1. Be aware that this query is based on a SharePoint source, so re-use for other sources may include some extra changes.  
  2. You have to replace the amber parts for your own URL to your SharePoint source and folder path to make this work.
  3. Also know that the name of the Power Query Function is hard-coded. If your function is named differently, you must change this as well. The function name is marked amber as well. 

It is easier to re-use this code, by first generating the function in Power Query and change the code afterwards. 

let
    Source = SharePoint.Files("[URL to SharePoint Source]", [ApiVersion = 15]),
    #"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([Folder Path], "[Filter to get the right folder]")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Name", "Extension", "Content"}),
    #"Filtered Hidden Files1" = Table.SelectRows(#"Removed Other Columns", each [Attributes]?[Hidden]? <> true),
    #"Filtered Hidden Files2" = Table.SelectRows(#"Filtered Hidden Files1", each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files2", "Transform File from SharePoint Folder", each #"Transform File from SharePoint Folder"([Content])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Transform File from SharePoint Folder"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns1", "GetColumnNames", each Table.ColumnNames([Transform File from SharePoint Folder])),
    #"Removed Other Columns2" = Table.SelectColumns(#"Added Custom",{"GetColumnNames"}),
    #"Expanded GetColumnNames" = Table.ExpandListColumn(#"Removed Other Columns2", "GetColumnNames"),
    #"Removed Duplicates" = Table.Distinct(#"Expanded GetColumnNames"),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from SharePoint Folder", #"Removed Duplicates"[GetColumnNames])
in
    #"Expanded Table Column1"

 

5 thoughts on “Dynamically expanding and appending tables with Power Query

  1. Pingback: Dynamically expanding and appending tables with Power Query — Data – Marc – SutoCom Solutions

  2. Pingback: Pimp the dataflows connector in Power BI – Data – Marc

  3. aoifoc

    This was so useful and completely solved the problem I was having (my data source was an Azure Data Lake Storage but my problem was identical)! Thanks so much, learned something new today.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s