Dynamically expanding and appending tables with Power Query

It has been a month since my last blog post. So, it was about time for new content! Since last month I’m busy preparing for the Power Platform Summit at the end of March in Amsterdam, so not much time left. Find more about the summit on my events page. So far so good, now time for some blog content.

In this blog, I will tell you more about dynamically expanding and appending tables with Power Query. I was triggered by this topic by Matthew Roche on Twitter. Thanks for that! In this blog I’ll describe how you can get all column names of all tables dynamically before you append all tables. With this, your table will automatically add new columns when the structure of your file changes or extra columns are added to your source.
Matthew Roche_Twitter_Table.Expand1

After some debugging Matthew came up with the following conclusion:

The problem is that the column used for the song title was named inconsistently in some files produced by the web crawler. The rest of the columns were named consistently, but not this one…

The files whose columns matched the root query definition had their song data returned properly. The others had all song titles missing.

In this case, it wasn’t clear that there was another column in the dataset which included all missing data. By dynamically getting the column names before expanding, you can see the other column popping-up, which is not the result what you expected.

Our case

To get things a little bit more specific, let’s use the following case:
I’ve a SharePoint folder including multiple Excel files, which I want to combine, since they all have the same structure. In that case you don’t want to create separate queries for each file but want to append those files dynamically. To do this, you have to connect to the SharePoint folder and afterwards combine all files. By clicking the Combine button, Power Query will generate a function which runs over all these files, generate tables and appends those tables afterwards.

2019-03-12 20_59_56-Untitled - Power Query Editor

Power Query asks you to pick an example file for generating the Power Query function. Automatically this will be the first found file in the SharePoint folder. Looking at the code generated in the function, this will include all column names to expand hard-coded! But what in case one of our files include extra columns than what is available in our example file? These column names are not included in the generated code, since that is based on only the first file. Conclusion: They will not be expanded and will never show up!

Get your column names dynamically

2019-03-12 21_13_08-Agenda - Marc.Lelijveld@macaw.nl - OutlookIn my case, I have three files which are almost the same. They all have a Date, Category, Amount and Price column. But in the third file, I’ve added Brand as well. Since my query is based on the first file, this column will not show up. With generating the Function, Power Query automatically appended all tables in the folder according to the generated function. With that, it also generated the steps as shown in the screenshot.

Once we open the advanced editor, we see the following line of code generated. The #”Sample File” tells us that it is getting all column names from the Sample file to expand. In our case the Sample file, was the first file.

#”Expanded Table Column1″ = Table.ExpandTableColumn( #”Removed Other Columns1″, “Transform File from SharePoint Folder” , Table.ColumnNames( #”Transform File from SharePoint Folder”(#”Sample File”)))
To see the complete Power Query code, scroll down all the way to the bottom of this blog.

So, we want to dynamically expand our table based on all found columns in all tables. To make this happen, we jump a few steps back in our applied steps to the Removed Other Columns1 step. When we click the expand button and load all column names, we already see the Brand column popping-up. But still this will be a hard-coded result. We want to get all column names dynamically. To do that, we are going to add another step in between here.

  1. Click Add Column in the ribbon
  2. Click Custom Column, name this column “GetColumnNames”
  3. Add the following code:
    Table.ColumnNames([Transform File from SharePoint Folder])
  4. As a result, you will get another column with a list of column names per file.
    2019-03-13-12_11_56-demo-file-power-query-editor.png
  5. Now we want to get a unique list of all these column names. So, delete all other columns, so we only have GetColumnNames left.
  6. Expand the GetColumnNames column to new rows.
  7. Remove duplicates from this table. Now we see all five unique column names dynamically generated.

Since the Table.Expand function only accepts lists as an argument, we need to convert this table to a list. We’ll do this later on in the expand step.

Now, we’ve added a few steps in between. At the moment we have an Expression Error, since our Expand table Column1 step refers to the last added step before, which is Remove Duplicates. We want this step to use the result of our table before we added all steps in between. Power Query always first calls the result of the step before, so we can easily change this to the result of a few steps before. Referring to a previous step always starts with a hashtag. Now we only have two steps left.

  1. To refer our Expand Table Column1 to the result before getting the column names, we are going to change the #”Remove Duplicates” to #”Removed Other Columns1″. 
  2. Now our query still refers to the #”Sample File” to get the columns to expand. We’re going to change this in to a list of the column names which we generated in the steps before. To make this happen, we only have to replace Table.ColumnNames(#”Transform File from SharePoint Folder”(#”Sample File”))
    for #”Removed Duplicates”[GetColumnNames].

2019-03-13 12_18_55-Demo file - Power Query Editor.png

Now, we see the Brand column popping-up as well! It will remain empty for the files where we didn’t have this column. But at least it is available now!
To see the complete Power Query code, scroll down all the way to the bottom of this blog. 

By directly calling the column name, which is GetColumnNames in this example, after the step with the previously defined result, it automatically formats as a list. So #”Removed Duplicates” is the previous step we’re referring to.

Wrapping things up

I’m aware of the fact that the last step of Power Query which we replaced, does something similar based on the Sample File. But since our sample file can only be one file, which might not include all columns, this is a more sustainable solution. Even when there are new columns added in the future, they will automatically appear in your table.

Besides that, you could have done the same, in less Power Query steps. But I’ve done this in extra steps consciously. With that I keep everything organized and understandable in case I need to transfer the code to someone else.


Complete Power Query code

The complete Power Query code is shown below. For re-use, you only must be aware of a few things:

  1. Be aware that this query is based on a SharePoint source, so re-use for other sources may include some extra changes.  
  2. You have to replace the amber parts for your own URL to your SharePoint source and folder path to make this work.
  3. Also know that the name of the Power Query Function is hard-coded. If your function is named differently, you must change this as well. The function name is marked amber as well. 

It is easier to re-use this code, by first generating the function in Power Query and change the code afterwards. 

let
    Source = SharePoint.Files("[URL to SharePoint Source]", [ApiVersion = 15]),
    #"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([Folder Path], "[Filter to get the right folder]")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Name", "Extension", "Content"}),
    #"Filtered Hidden Files1" = Table.SelectRows(#"Removed Other Columns", each [Attributes]?[Hidden]? <> true),
    #"Filtered Hidden Files2" = Table.SelectRows(#"Filtered Hidden Files1", each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files2", "Transform File from SharePoint Folder", each #"Transform File from SharePoint Folder"([Content])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Transform File from SharePoint Folder"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns1", "GetColumnNames", each Table.ColumnNames([Transform File from SharePoint Folder])),
    #"Removed Other Columns2" = Table.SelectColumns(#"Added Custom",{"GetColumnNames"}),
    #"Expanded GetColumnNames" = Table.ExpandListColumn(#"Removed Other Columns2", "GetColumnNames"),
    #"Removed Duplicates" = Table.Distinct(#"Expanded GetColumnNames"),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from SharePoint Folder", #"Removed Duplicates"[GetColumnNames])
in
    #"Expanded Table Column1"

 

18 thoughts on “Dynamically expanding and appending tables with Power Query

  1. Pingback: Dynamically expanding and appending tables with Power Query — Data – Marc – SutoCom Solutions

  2. Pingback: Pimp the dataflows connector in Power BI – Data – Marc

  3. aoifoc

    This was so useful and completely solved the problem I was having (my data source was an Azure Data Lake Storage but my problem was identical)! Thanks so much, learned something new today.

    Like

  4. Daniel Blair

    Would you be able to provide an example of a PQ that only is referencing one table or source file? I have a proposal template that reads data from our “Project Database”, but any time we add columns the query fails. Is there is a more reliable way to read the data from our centralized source to populate our proposal workbooks? I used to have the “Database” in google sheetz and linked via a web connection. Now that I am trying to have it all on our Sharepoint server, it has been nothing but a headache…

    Like

      1. Daniel Blair

        Thanks for the fast response. Sorry am speaking in layman’s terms… Here is the set up:
        1. Multi-tab Excel Workbook that acts like a “database” for Projects, Staff, Consultant and other data. The Project Database tab is constantly being updated + columns are being added or moved as the fields required grows.
        2. The “database” workbook is “linked” to our proposal template to push out all data needed for proposals (e.g. project histories, consultant contact info, etc.)
        3. The “database” used to reside in Google Sheets + linked via a web connection. Any changes made to the “database” was refreshed to the host (Proposal Template) without any errors. It was simple + worked beautifully.
        4. Now I have attempted to bring the “database” local by making it an Excel file on our server. The host workbook is now reading the “Database” from this Excel file. I did this thinking it was more secure + would be easier to connect two Excel Workbooks.
        5. Since making this change, I have received nothing but errors due to constant changes in the fields and it has been a disaster. Finding anything to clearly explain how to resolve this simple + unnecessary error is just as confusing since most examples are more complicated.

        Like

      2. Hi Daniel,
        The challenge is mainly in column names. The expand option I’m explaining in this blogpost is all about avoiding to have hard coded column names in your Power Query or wherever. If I understand your case correctly, you build a layer inbetween that is reading from Google Sheets and saving in a local Excel file. I assume that you have build logic in that setup, that references specific columns directly, and there you go…

        All based on assumptions and my interpretation of your comment. But maybe it gives you some direction in where to look.

        Like

      3. Daniel Blair

        Marc,
        Would you be interested in doing a brief Zoom call to discuss this in more detail? I would be happy to bring you in as a consultant to help me get this set up.

        Dan

        Like

  5. Gugan

    Hi Marc,
    Thanks for the article. Have applied all the steps.
    However, it keeps running long time at the last step,

    #”Expanded Table Column1″ = Table.ExpandTableColumn(#”Removed Other Columns1″, “Transform File from SharePoint Folder”, #”Removed Duplicates”[GetColumnNames]).
    after Remove duplicates step, finally it has 120 columns

    It continue to progress without any result. am i missing anything here.

    Thank you.

    Like

    1. Hi Gugan,

      To me, this looks like a very specific error with your dataset. I can’t tell if having 120 columns is correct or not.
      Basically, you generate a query that contains a unique list of all column names. This list is used later on in the expand step.

      Sorry, I need to have more details to help you with this. Otherwise it will be a bit trail and error.
      Marc

      Like

  6. Hi Marc,
    This is an amazing solution and exactly what I was looking for, thank you!
    The only problem I have that data reloading takes forever and if I merge this table with another table from Sharepoint I get a SharePoint connection error. Any ideas what is causing this?

    Like

    1. Hi Anna,

      The slowness of the query is mainly due to the fact that every table needs to be analyzed by the query before appending. So that slows down the process. The more tables to append, the slower it gets.

      With regards to SharePoint, I’m not sure. But my first guess would be throttling. SharePoint throttles the connection to avoid overcommitting SharePoint as a source.

      Hope this helps to understand what’s going on.
      Marc

      Like

  7. Noorul

    HI,
    I am a beginner in Power Query, once i click “split column” under the advance option, there’s no option to choose “rows”. My objective is to split multiple line in one cell to separate rows.

    Appreciate your help,

    Rgds,
    Noorul

    Like

  8. Hi Marc. Thanks for this blog. It’s incredibly useful. I’ve been playing around with it in a couple of scenarios and I can really see how it will help get around ‘hard coding’ issues that pop up for me frequently.
    Do you think it’s possible to carry out a similar operation but for merge rather than append? I’m going to have 100+ workbooks containing tables with ~4k rows that are identical bar the last column. I was hoping to create a master table that would add on the new columns against the common identifiers without appending and duplicating everything. I’ve spent the last few days googling like crazy and trawling youtube and I’m getting a depressing feeling that this is not possible?

    Like

    1. I think it would be possible, but performance would be crap… Merge is a heavy operation. And if I understand well, you want to do this with ~100 workbooks?
      I would advice to consider a decent data platform solution rather than trying to solve these volumes of data in Power BI directly.

      Like

      1. conchobbar

        Thanks for your speedy reply. Yes, I suspected it would a cripplingly slow as even the one above takes forever to load for me with sharepoint.
        I’ve seen a couple of things that I think I could adapt, but it may be better to find an alternative solution. Thanks for the advice!

        Like

Leave a comment