Hello,
Might anyone have an M coding suggestion to help me convert an Excel IF/SUMPRODUCT formula helper column into a New Column "formula" in Power Query?
I work with vendor billing data to create a scorecard.
The data is currently at ~40,000 rows with 35 columns. The set grows by ~500 rows/month. I store the vendor data in MS Access.
My all-Excel analysis/report solution for reporting will soon be unmanageable.
Linking the Access query to Power Query reduces my file size by 50 percent - before unnecessary fields are eliminated. This is a serious pay raise (of sorts).
Switching to a Access-to-PQ-to-Excel solution will be tremendous - provided I can clone my Excel IF/SUMPRODUCT formula in PQ for the helper column.
This helper column is key to using just one data Table for analysis/reporting. Anything more complex is too complex for clients to take over and manage.
My helper column assists pivot tables with job counts, filtering, and Calculated Field formulas. I no longer need a separate Distinct Count data set for the invoice top-sheet details.
The Excel helper column is:
=IF(SUMPRODUCT((A$2:A2=A2)*(B$2:B2=B2))>1,0,1). To the example, Col A = invoice #; Col B = job type.
Adding this formula to my PQ Query loaded to Excel really swells the file size and slows response time. It's not a sustainable solution.
So, I would like PQ to create my sole helper column.
Two files are enclosed:
- Excel file with an example data set and PQ query, and, a,
- Notepad++ file with the PQ Advanced Editor string for the Query.
I have reviewed the MS Power Query function library at length. Instinct encourages something with the List.* section of formulas may help.
But, I haven't hit on a modeling that makes sense to further explore/refine. Instinct also suggests a custom formula will be necessary. I have dabbled in editing M code, which I enjoy - but, I have not yet written a custom function.
I've researched this for days. My test results with my ever-growing M skills are to no avail.
I use this IF/SUMPRODUCT helper column method all the time for so many varied tasks. I would love to hit on a PQ solution.
I welcome all questions for anything here or in the enclosed files that is not clear.
Thank you in advance for your time to consider my post and any time you might spend thinking about,
With regards,
Libby
Carlsbad, California
Hi Libby,
There is no attachment, and we cannot help without seeing the data structure.
Can you try uploading again?
Interesting about no documents, Catalin! Thank you for your reply. I can't explain why the docs did not load. Attached is another copy of the Excel file.
PQ offers a simple solution for my helper column for one vendor. That vendor provides a column of line-item numbering for each item on each invoice. A new conditional format column gets the job done.
The other vendor? A much harder challenge for me right now. Finding a way to identify the duplicates in one column (e.g., invoice numbers) is a harder puzzle for me at the moment. Sense says find an incrementing means to ID duplicates, then use a conditional string to isolate the first instance and add 0's to the balance. We'll see.
As with anything in these programs, there are lots of workarounds.
Because this helper column is so useful to me, I added the H/C to the data and updated my Access table. Each month, I'll add that H/C to the vendor's report and append that data to the Access table. Job done.
But, to develop a PQ means to address the logic would be grand. I recognize that Excel think differs from PQ think. I am enjoying the deep dive into M and trying everything I can think of. I can't break a thing.
Thank you for your time considering this post.
--Libby Reiser
Still no attachments.
Note that there are 2 buttons for finalizing an upload:
-Add Files button, this will allow you to browse to select the file, and:
-Start Upload button, this will actually perform the upload, if you use only the Add Files button, no file will be added.
Thanks for the teaching, Catalin. I simply did not see the Start Upload button. Sorry. Thank you in advance for your time. --Libby
Hi Libby,
The sample file was very helpful to clarify the problem.
Looks like the solution is very simple: you have a column with Line Item numbers, which are in fact a count of Invoice No-Job Type. The first occurence of that Invoice No-Job Type combination will have 1 as the Line Item, we can use that.
Make sure that Line Item column is a numeric type (change the data type), then use this simple formula in a new added column:
=if [Line Item]=1 then 1 else 0
Thank you, Catalin.
Yes, your conclusion is what I determined, as well, for one of the vendors.
I'll add your PQ string to the pair of PQ solutions I enjoyed fashioning to get me started. (See Green tab)
Alas, this nifty easy solution works for only one vendor.
The second vendor does not include line-item numbering data in its report.
The second vendor's data is the prime reason for my post.
To that vendor's data, I apply the IF/SUMPRODUCT string in Excel to fashion a 'Distinct Count' type of list in a columnar fashion.
**Is there a PQ procedure that can replicate the result of the Excel IF/SUMPRODUCT string?
When clients take over the models (we are legal service providers, not data/finance folks), they will never understand how it works, but they'll simply know that it does work. More accurately, I will understand - they will have no huge desire to learn it!!
If you may have any time to further consider my original question, I will be indebted.
Thank you in advance. --Libby
Hi Libby,
The short answer is there isn't a PQ formula that can replicate your IF/SUMPRODUCT since PQ can't refer to rows above/below the current row.
However, I'm confident you can use a combination of PQ and probably PivotTables to get your desired end results, but I don't know what they are. For example, if you wanted to get the PivotTable in cells L25:O30 then this would be possible using the Group tool in PQ but you're saying you don't want to use this.
If you use the IF/SUMPRODUCT formula result for other calculations then those can probably be done another way too, but without knowing what they are I can't help.
Mynda
Thank you, Mynda, for your time to review the post and reply. Agreed.
There is no straightforward way to replicate my string in PQ.
During my own research, I've explored my understanding absolute and relative references in PQ. I found the linked article here to be a fabulous entree to the topic so I could build on the concepts.
Learning about drill-through's was interesting. Having the potential for another purpose to write strings that incorporate such structure is fascinating!
But, for my instant question, I'll use my workaround in Excel. If I hit on an easy multi-step solution via PQ, I will let you know.
Thanks, again, for your time to review and comment.
With regards,
Libby
Hi Libby,
As Mynda said, many excel formulas cannot be replicated in the same way.
You have to find a different way of thinking in Power Query, to get to the same result.
To replicate the formula in PQ, I used the following technique:
1. Add 2 index columns: one as a zero based index, and the second column as a 1 based index
2. Merge the query with itself (select the same query to merge), but select index 0 as the maching column for the first query, and index 1 for the second query.
3. Expand only 2 columns from the merged query: Invoice No and Job Type. You will now have 2 columns for Invoice No, and 2 columns for Job type, but there will be an offset of 1 row between them, because we used those 2 indexes with different bases.
4. Add a new column with a simple formula to compare Invoice & Tob Type columns against Invoice2 & Type 2:
=if [Invoice No]&[Job Type]<>[Invoice No.1]&[Job Type.1] then 1 else 0
This will detect the point when there is a change of Invoice & Type and it will return 1 only when these values are different from the row above.
You can now delete the additional columns used: index columns, the invoice and Type duplicates, we have the column we need.
Check the attached file for a functional example.
Thanks for sharing the link, Libby. It's an interesting idea but I think it should be avoided unless working with small sets of data because it's likely to result in performance issues on large data sets.
Better to start thinking like Power Query instead of trying to make Power Query work like Excel.
Hopefully Catalin's solution will help.
Mynda