• Skip to main content
  • Skip to header right navigation
  • Skip to site footer

My Online Training Hub

Learn Dashboards, Excel, Power BI, Power Query, Power Pivot

  • Courses
  • Pricing
    • Free Courses
    • Power BI Course
    • Excel Power Query Course
    • Power Pivot and DAX Course
    • Excel Dashboard Course
    • Excel PivotTable Course – Quick Start
    • Advanced Excel Formulas Course
    • Excel Expert Advanced Excel Training
    • Excel Tables Course
    • Excel, Word, Outlook
    • Financial Modelling Course
    • Excel PivotTable Course
    • Excel for Customer Service Professionals
    • Excel for Operations Management Course
    • Excel for Decision Making Under Uncertainty Course
    • Excel for Finance Course
    • Excel Analysis ToolPak Course
    • Multi-User Pricing
  • Resources
    • Free Downloads
    • Excel Functions Explained
    • Excel Formulas
    • Excel Add-ins
    • IF Function
      • Excel IF Statement Explained
      • Excel IF AND OR Functions
      • IF Formula Builder
    • Time & Dates in Excel
      • Excel Date & Time
      • Calculating Time in Excel
      • Excel Time Calculation Tricks
      • Excel Date and Time Formatting
    • Excel Keyboard Shortcuts
    • Excel Custom Number Format Guide
    • Pivot Tables Guide
    • VLOOKUP Guide
    • ALT Codes
    • Excel VBA & Macros
    • Excel User Forms
    • VBA String Functions
  • Members
    • Login
    • Password Reset
  • Blog
  • Excel Webinars
  • Excel Forum
    • Register as Forum Member

Solution for dirty PDF data with variable number of columns?|Power Query|Excel Forum|My Online Training Hub

You are here: Home / Solution for dirty PDF data with variable number of columns?|Power Query|Excel Forum|My Online Training Hub
Avatar
sp_LogInOut Log In sp_Registration Register
sp_Search Search
Advanced Search|Last Search Results
Search
Forum Scope




Match



Forum Options



Minimum search word length is 3 characters - maximum search word length is 84 characters
sp_Search Search
sp_RankInfo
Lost password?
sp_CrumbsHome HomeExcel ForumPower QuerySolution for dirty PDF data with va…
sp_PrintTopic sp_TopicIcon
Solution for dirty PDF data with variable number of columns?
Avatar
Simon Smith
Member
Members
Level 0
Forum Posts: 26
Member Since:
May 19, 2021
sp_UserOfflineSmall Offline
1
December 5, 2021 - 7:38 am
sp_Permalink sp_Print

Hi Mynda, Phil and fellow learners,

I'm extracting data from PDFs downloaded from https://www.morningstar.co.uk/uk/ a finance and investment website. I've hit a snag because sometimes the table I want has 6-columns and sometimes 5. The attached PDFs, table 7, BNKR-port2.pdf has 5, BGFD-port2.pdf has 6. The 'extra' column in the latter is column1 it has no useful data, In my custom function scraping the data for "Stock Sector Weightings %", which in the BGFD file is in Column2 and Cloumn5 or in BNKR Column1 and Column4 causes an error when the script can not find a Column6.

What I am trying to do is to put a step in after Source and Table007 steps to identify if Column1 is full of null and [image] then delete Column1 else do nothing and move on to the #Changed Type step.

Do you know of a way to do this using if or try? Or is there an easier path?

I hope that you are enjoying your summer, a bit cold & dark here - but your courses are a great distraction.

Kind regards,

Simon

PS - looking forwards to seeing the new PQ course videos!  

sp_AnswersTopicSeeAnswer See Answer
Avatar
Catalin Bombea
Iasi, Romania
Admin
Level 10
Forum Posts: 1807
Member Since:
November 8, 2013
sp_UserOfflineSmall Offline
2
December 5, 2021 - 3:16 pm
sp_Permalink sp_Print sp_EditHistory

Hi Simon,

Here is a solution (incomplete, just the relevant steps):

Table7Data = Table.ExpandTableColumn(#"Removed Other Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6"}),
Col1= List.Distinct(List.RemoveNulls(Table7Data[Column1])),
Custom1 = if List.Count(Col1)=1 and Col1{0}="[image]" then Table.RemoveColumns(Table7Data,{"Column1"}) else Table7Data

In the step Col1, if you refer to a column from a table, you can apply list functions on that column, knowing that a column from a table is a List object, therefore Table7Data[Column1] is a list.

Use the RemoveNulls function, then Distinct function should reduce the list to only the "[image]" item, if Col1 list matches this scenario, you can remove column 1 (did that in Custom1 step above).

Col1{0} refers to the first item in list, if there is nothing in the list after removing nulls this part might generate an error, so you can avoid that with a try..otherwise statement:

Custom1 = if List.Count(Col1)=1 and (try Col1{0}="[image]" otherwise false) then Table.RemoveColumns(Table7Data,{"Column1"}) else Table7Data

sp_AnswersTopicAnswer
Answers Post
Avatar
Simon Smith
Member
Members
Level 0
Forum Posts: 26
Member Since:
May 19, 2021
sp_UserOfflineSmall Offline
3
December 5, 2021 - 10:00 pm
sp_Permalink sp_Print

Catalin,

That's really helpful and very well explained, so you have taught as well as fixed!

Many thanks,

Simon

sp_Feed
Go to top
Forum Timezone: Australia/Brisbane
Most Users Ever Online: 245
Currently Online: Richard West, Shanna Henseler, Lawrence Smith, Nada Perovic
Guest(s) 11
Currently Browsing this Page:
1 Guest(s)
Top Posters:
SunnyKow: 1432
Anders Sehlstedt: 870
Purfleet: 412
Frans Visser: 346
David_Ng: 306
lea cohen: 219
A.Maurizio: 202
Jessica Stewart: 202
Aye Mu: 201
jaryszek: 183
Newest Members:
John Chisholm
vexokeb sdfg
John Jack
Malcolm Toy
Ray-Yu Yang
George Shihadeh
Naomi Rumble
Uwe von Gostomski
Jonathan Jones
drsven
Forum Stats:
Groups: 3
Forums: 24
Topics: 6212
Posts: 27236

 

Member Stats:
Guest Posters: 49
Members: 31889
Moderators: 3
Admins: 4
Administrators: Mynda Treacy, Philip Treacy, Catalin Bombea, FT
Moderators: MOTH Support, Velouria, Riny van Eekelen
© Simple:Press —sp_Information

Sidebar

Blog Categories

  • Excel
  • Excel Charts
  • Excel Dashboard
  • Excel Formulas
  • Excel PivotTables
  • Excel Shortcuts
  • Excel VBA
  • General Tips
  • Online Training
  • Outlook
  • Power Apps
  • Power Automate
  • Power BI
  • Power Pivot
  • Power Query
microsoft mvp logo
trustpilot excellent rating
Secured by Sucuri Badge
MyOnlineTrainingHub on YouTube Mynda Treacy on Linked In Mynda Treacy on Instagram Mynda Treacy on Twitter Mynda Treacy on Pinterest MyOnlineTrainingHub on Facebook
 

Company

  • About My Online Training Hub
  • Disclosure Statement
  • Frequently Asked Questions
  • Guarantee
  • Privacy Policy
  • Terms & Conditions
  • Testimonials
  • Become an Affiliate

Support

  • Contact
  • Forum
  • Helpdesk - For Technical Issues

Copyright © 2023 · My Online Training Hub · All Rights Reserved. Microsoft and the Microsoft Office logo are trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries. Product names, logos, brands, and other trademarks featured or referred to within this website are the property of their respective trademark holders.