May 19, 2021
Hello everyone,
I've a weird issue. A .csv file that is large, but looks very normal when loaded into excel gets corrupted when power query picks it up into the 'transform data' stage. Column A is 'company name' from a database of suppliers to the government. Each row is a Cloud Hosting service that a supplier has put on the catalogue to make available to public sector customers to buy.
But when I load this to power query some (not all) of the data from columns EZ and beyond get displayed in Column A (and beyond) under the 'company name' - when they should be just in a single row of data going out to column GJ. I'll attach the .csv file that I generate from Scrapy taking the data off the government website. Copyright is fine for me to do this, it's all covered by the UK government Open Government Licence (OGL).
I have tried reformatting the .csv file to remove the alternate blank rows and also changed the asterisk (*) as a string separator, but it still makes this error.
I'd be very grateful if someone has a solution for this. The Hosting1.csv file attached only has the first 100 rows of data (and alternate blanks) the live files I'm using have over 20,000 rows of data. The image file attached shows what I'm seeing when I load the data, the errors are visible in rows 16,23,26 and 33.
Many thanks
Moderators
January 31, 2022
It's not a bug. PQ simply allows you to treat quoted line breaks in different ways. The default is to apply them, but in this case that's not what you want. Press the cog wheel next to the Source step and change the Line breaks setting to ignore them. Then the last bit of Column156 (mgmt_dvc) plus all columns thereafter will NOT be displayed in a new row starting Column1.
Answers Post
The following users say thank you to Riny van Eekelen for this useful post:
Philip Treacy1 Guest(s)