Share via

Prevent Merged Query from Re-downloading Data in Power Query

Anonymous
2020-04-10T04:16:19+00:00

Hello,

I'm having some trouble with queries in Power Query from Workbook1.

I have:

  1. Query A, downloads 1.4 mil. rows from a SQL server and loads to a pivot table
  2. Query B, pulls data from a table in Workbook1
  3. Merge-query AB joining the table and the SQL data

When I run Query AB, for some reason, it re-downloads the 1.4 mil. rows of data that were already downloaded and loaded to the workbook for Query A before it performs the join of the datasets.

I don't want it to do this. Why does it do this? Why wouldn't it just use the data that's already been downloaded?

How can I get this to work as I'd like? All help is appreciated.

Thanks.

Microsoft 365 and Office | Excel | For home | Windows

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments

12 answers

Sort by: Most helpful
  1. Anonymous
    2020-04-26T05:35:57+00:00

    Lz, thanks for your reply and attention. Sorry for the delay. I am still in need of a solution to this inquiry.

    I'm sorry that you didn't find this question sufficiently clear. After re-reading, I do think that I included all necessary information. Please do read it again when you have a chance. I have added some additional information below, just in case.

    My application:

    I use Power Query to download/refresh 2 mil. rows of data from a SQL database and load them into a pivot table for one purpose on a monthly basis (the data from the source change only every month).

    However, I have users that will need to “merge” (join) smaller datasets with these 2 mil. rows between monthly downloads/refreshes for another purpose, but, when we use Power Query’s merge function to join the existing query of 2 mil. rows with a query of a small table in the same workbook, Power Query appears to re-download all of those 2 mil. rows of data again before/while performing the actual merge.

    I only want to merge data with the 2 mil. rows that were already downloaded and are saved in the workbook. I don’t want to re-download them every time I want to merge smaller datasets, as the data only change in the source on a monthly basis.

    How can we use Power Query's merge function to merge a smaller dataset with the 2 mil. rows that were already downloaded without refreshing/re-downloading those 2 mil. rows?

    Was this answer helpful?

    1 person found this answer helpful.
    0 comments No comments
  2. Anonymous
    2020-04-29T03:48:25+00:00

    Lz, Thanks very much for your informative reply. The Microsoft Doc you linked-to was also helpful. I was holding out hope for the clause "...and its load is disabled..." but it didn't last. 🙂

    Even though this is expected behavior and I understand its rationale, I do think that it would be awesome if there was an organic way to accomplish this kind of flow. After all, I don't imagine that my application can be very uncommon--big-datasets (larger than the worksheet size-limit) that change monthly needing to be merged with many small datasets several times each month.

    "The 1.4M of rows used by your Pivot Table do not represent a record set"

    This is somewhat puzzling. I was originally expecting an organic solution because I expected that Power Query could reference this loaded data somehow. Although I wasn't thinking that a pivot table could be referenced, the 1.4 million rows are still being loaded somewhere into the workbook. It would be cool to be able to reference that loaded data.

    Is there a way for Power Query to reference data currently in a data model?

    Was this answer helpful?

    0 comments No comments
  3. Lz365 38,201 Reputation points Volunteer Moderator
    2020-04-28T10:52:32+00:00

    Hi again

    I thought about your scenario and now better understand the purpose of your other previous question re. Joining/Merging a Table from an Excel workbook with i.e. a SQL table

    For other readers: a Join/Merge operation cannot take place on the SQL Server side and is consequently done on the client side, all records from the SQL table being necessarily "dowloaded" to make it happen. On the other end, that is not the case when Combining/Appending tables (this can take place on the server side)

    I have 2 options to do what you want: A single dataset download once a month, allowing interim Join/Merge operations with Excel table(s) without going back to SQL server

    With 1.4M of rows we exceed the max. #rows of an Excel spreadsheet. Consequently we need to split the SQL record set into 2. As I do not want to spend time documenting something that won't be used, please choose between the following options:

    #1 A single query to SQL that requires 2 refreshes per month (2 x 1.4M rows - can't be avoided)

    #2 2 queries to SQL. Each bringing a limited amount of records (i.e. 1M rows for the first; Remaining rows for the 2nd). Each query being refreshed once per month

    Just to make things clear: whatever the option is, we need to bring that 1.4M rows into 2 Tables in the Excel workbook. In other words we create a "local" database

    Let me know

    Was this answer helpful?

    0 comments No comments
  4. Lz365 38,201 Reputation points Volunteer Moderator
    2020-04-28T10:36:48+00:00

    Hi dskrobow

    This post to answer the following question only, although I'm quite sure you know it already

    When I run Query AB, for some reason, it re-downloads the 1.4 mil. rows of data that were already downloaded and loaded to the workbook for Query A before it performs the join of the datasets

    In a nutshell this is expected. The 1.4M of rows used by your Pivot Table do not represent a record set (i.e. a Table) in your workbook. In other words, a Pivot Table isn't a Table (a bit confusing I must admit) so you cannot use it as a data Source to build another query. If a flattened Pivot Table would generate a Table that could do the trick, but that's the case

    So, when you build/refresh Query AB you use the Connection established by Query A. And as expected that Connection brings you up to date information (hence what you call a re-download) - whatever the data Source is. Otherwise what would be the benefit of a Connection to a data Source???

    I hope this clarifies things

    EDIT: Just found out Referencing Power Query queries. It focuses on Power BI Desktop (Power Query is a sub-set of the Power BI suit). You can stop reading before para. Recommendations as the latter only apply to Power BI - as of today the concept of dataflow doesn't exist in Power Query

    Was this answer helpful?

    0 comments No comments
  5. Lz365 38,201 Reputation points Volunteer Moderator
    2020-04-13T07:54:02+00:00

    To anybody else willing to assist I would suggest you read Reference Table from Worksheet in SQL Script in Power Query / MS Query first

    Was this answer helpful?

    0 comments No comments