Forum

Best way to sort ra...
 
Notifications
Clear all

Best way to sort raw interval data

14 Posts
3 Users
0 Reactions
129 Views
(@apay77)
Posts: 15
Eminent Member
Topic starter
 

I have been given 5 days of energy use data split in 5 min intervals and recorded across 6 channels.

I'm currently working through the excel courses but I have no idea what is the best way to process this data. Where do I start?

I need to be able to:

  • graph the Max current (column N) for each of the connection types.  Ideally a line graph with the 3 AC load channels and another line graph with the 3 PV site channels.
  • Because there is way too much data to graph at 5 min intervals I need to convert the 5 min data into 30 min data.  When doing this, some of the energy columns need the 5 min data values added for each 30 mins and some just need to find the max for the 30 min period.
  • It would nice to also then be able to analyse the data to see totals and max values for a whole day and know what day of the week that date is.   
  • run a formula or query that finds the max of a column in a day or a month and tell me at what time and date that value occurred.

Where do I start?  Do I have to learn advance formulas in Excel, power query or power bi?  I would like to learn how to process this data myself instead of relying on people processing it for me.

I happy for you to also point me to the right resources to learn how to do these things.

Thanks in advance!

 
Posted : 07/09/2022 8:40 pm
(@mynda)
Posts: 4762
Member Admin
 

Hi Anita,

I've moved this question to the Excel Expert course members forum so that it gets answered promptly. Can you please upload a sample of your data in an Excel file so I can give you some ideas.

Just 3 days worth across 2 channels will do. You can anonymize it if required.

Mynda

 
Posted : 08/09/2022 5:51 am
(@apay77)
Posts: 15
Eminent Member
Topic starter
 

Hi Mynda,

Thanks for replying to me!

I've attached the data.

The coloumns are now different to how I originally described them as I deleted the unnecessary ones.

  • There are 6 channels (column B).  3 for AC load net and 3 for PV site net.  The data needs to be sorted by each channel.
  • The 5 min data then needs to be converted to 30 min data.  In the past I have done this by first creating a 30 min interval column of the times.
  • Energy (Wh) needs to be converted to kWh by dividing the value by 1000. Then 6 x 5 min intervals of the kWh value gets summed to find the 30 min value for kWh.
  • Power (W)  - needs to be converted to kW by dividing the value by 1000.  Then we have to find the average of  6 x 5 mins intervals of the kW value to find the average kW over 30 mins.
  • Energy Reactive (VArs) - we need to convert this to kVA by dividing the value by 1000.  Then we have to find the average for 30 mins as we did for Power (W).
  • Avg Current (A) - We need to find the average of 6 x 5min intervals (as per Power) to find the average Current over 30 mins.
  • From here we need to:
    • graph the avg Current (A) per 30 mins for the 3 AC load net channels over 24 hours.  Eg. a line for each AC channel's avg value.
    • Find out the Max kVA value for each month and what date and time it occured. (Normally I have 12 months of data)
    • Know the day of the week for each date

There are various other things to extract from the data but knowing the easiest way to process the data so I can start running queries would be helpful.  To date I have just been getting one off formula help from another forum but I don't understand the formulas enough and sometimes they don't work when I try to apply it to a new set of data.  

Should I be using power query to process this data? I have signed up for your course on PQ but have not started it yet.

Thanks for your help.  

Regards,

Anita

 
Posted : 16/09/2022 1:53 am
(@mynda)
Posts: 4762
Member Admin
 

Hi Anita,

Answers to your bullet points below:

  • There are 6 channels (column B).  3 for AC load net and 3 for PV site net.  The data needs to be sorted by each channel.
    • The PivotTable will automatically sort and group the data for you.
  • The 5 min data then needs to be converted to 30 min data.  In the past I have done this by first creating a 30 min interval column of the times.
    • You still need to add a column for the intervals. This can be done in Power Query. See file attached - note the aggregation in the PivotTable can be set to sum, average, min, max etc.
  • Energy (Wh) needs to be converted to kWh by dividing the value by 1000. Then 6 x 5 min intervals of the kWh value gets summed to find the 30 min value for kWh.
    • This can be done in Power Query.
  • Power (W)  - needs to be converted to kW by dividing the value by 1000.  Then we have to find the average of  6 x 5 mins intervals of the kW value to find the average kW over 30 mins.
    • This can be done in Power Query.
  • Energy Reactive (VArs) - we need to convert this to kVA by dividing the value by 1000.  Then we have to find the average for 30 mins as we did for Power (W).
    • This can be done in Power Query.
  • Avg Current (A) - We need to find the average of 6 x 5min intervals (as per Power) to find the average Current over 30 mins.
    • This can be done in Power Query.
  • From here we need to:
    • graph the avg Current (A) per 30 mins for the 3 AC load net channels over 24 hours.  Eg. a line for each AC channel's avg value.
      • See file attached as an example. Obviously, you can change the value being plotted and the aggregation method to average.
    • Find out the Max kVA value for each month and what date and time it occurred. (Normally I have 12 months of data)
      • You can use a formula for this.
    • Know the day of the week for each date
      • PivotTables won't let you set the date format, but you can add another column to your dataset in Power Query that extracts the day name which can then be used in the PivotTable. See file attached.

      I hope that points you in the right direction. I recommend you complete the courses before you dive into this task so that you're equipped with all the knowledge you may need.

      Mynda

 
Posted : 16/09/2022 6:31 am
(@apay77)
Posts: 15
Eminent Member
Topic starter
 

Mynda,

Thank you so much for taking the time to step me through this.  I will definately focus on the power query course now that I know that's my starting point for working with this data.

Regards,

Anita

 
Posted : 17/09/2022 2:00 am
(@apay77)
Posts: 15
Eminent Member
Topic starter
 

Hi Mynda,

Just one thing I noticed.  The Day Name is not correct as they all say Saturday. 

1/9/22 is Thursday, 2/9 is Friday and 3/9 is Saturday.

How can I fix this?

Regards,

Anita

 
Posted : 17/09/2022 2:24 am
Riny van Eekelen
(@riny)
Posts: 1217
Member Moderator
 

Since you are on-line now and Mynda is not, please allow me to help you.

The applied step Inserted Day Name contains a reference to the [Int.] column. That should be [Timestamp].

 

The M-code reads:

= Table.AddColumn(#"Reordered Columns", "Day Name", each Date.DayOfWeekName([#"Int."]), type text)

Change that to;

= Table.AddColumn(#"Reordered Columns", "Day Name", each Date.DayOfWeekName([Timestamp]), type text)

 

Close and Load and refresh the pivot table and chart and all will work as expected.

 
Posted : 17/09/2022 3:48 am
(@apay77)
Posts: 15
Eminent Member
Topic starter
 

Thank you very much Riny van Eekelen for your assistance.

All is now working.

 
Posted : 22/09/2022 12:30 am
(@apay77)
Posts: 15
Eminent Member
Topic starter
 

Hi,

Ok so I've created a query which combines 12 files (1 for each month) and each file contains 5 min data for that month and I'm just not sure where to go on a couple of things.  

  • What's the best option for close and load? 
    • I would like to have a table / worksheet with all the data in it to reference; &
    • create pivot tables and charts from the data.  
    • I initially loaded it as a table and noticed I then couldn't change the load type in the query.  I then changed it from the Queries and Connections side pannel but it deleted the table from the worksheet.  So the only way to see the full data set was to double click on the query in the side pannel.  When I do this it removes all the Data Types from the headers.  It seems to do this every time I edit the query.  It also still only has the option to close and load with the other options greyed out.
  • Should I create separate queries for the different things i'm trying to get out of the data?  Eg. I have to analyse kilowatts, kilowatt hours and kVas and create different graphs for each.  Am I better to create a kilowatt query that filters out everything not related and then do the same for the others?
  • Do I create all my pivot tables in a separate workbook to where the data is and then reference that file?
  • I have used the sample file you sent me to guide me in setting up my columns and I was wondering if you could explain the formula in the Int. column you created.  There are times where i'll be converting 15 min to 30min data and other times it will be 5 min to 30min data so I just wanted to make sure I know how to adjust the formula when needed.

I am working my way through the course as well. I only have modules 6 & 7 left but we have a time pressing need to get this data interpreted asap.

Thanks for your help.

 
Posted : 24/09/2022 3:32 am
(@mynda)
Posts: 4762
Member Admin
 

Hi Anita,

Once you Close & Load the first time, the only place you can edit the load to settings is by right-clicking the query in the Queries and Connections pane > Load to, as shown in lesson 5.01.

If you're only going to build PivotTables from the source data, the most efficient way to load the data is directly to a PivotTable, which you can then copy to make more PivotTables. If you load the data to the worksheet, then the data is in your file twice, once in the Table and once in the Pivot Cache.

If you want to have the data in a separate table in the worksheet, then you need to change the background refresh settings so that the PivotTables refresh properly. See lesson 5.01.

If you only want to see the full data set from time to time, then you can build a PivotTable and double click on the Grand Total cell of the PivotTable and it will create a new sheet with all of the data for the field summed in the Grand Total.

I wouldn't create separate queries for the different energy measures as you'll be duplicating the timestamp, channel Name and Connection Type, resulting in a bigger file than necessary. You also won't be able to filter all PivotTables with a single Slicer unless you use Power Pivot.

Do not create your PivotTables in a separate workbook to your report. Data spread across workbooks is error prone.

The Int. column formula used in your file (shown below) rounds to the nearest 30 minuntes:

=DateTime.From(Number.IntegerDivide(48*Number.From([Timestamp]),1)/48)

If you want 5 minute intervals, use this:

=DateTime.From(Number.IntegerDivide(288*Number.From([Timestamp]),1)/288)

If you want 15 minute intervals, use this:

=DateTime.From(Number.IntegerDivide(96*Number.From([Timestamp]),1)/96)

i.e. there are 24 hours in a day. If you want the hours broken down into 2 halves i.e. 30 minute intervals, you multiply 24 * 2 = 48.

If you want the hour broken down into 5 minute intervals, then there are 12 of those in an hour, so it's 24 * 12 = 288.

If you want the hour broken down into 15 minute intervals, then there are 4 of those in an hour, so it's 24 * 4 = 96

Mynda

 
Posted : 24/09/2022 9:08 pm
(@apay77)
Posts: 15
Eminent Member
Topic starter
 

Thanks for clarifying this Mynda.

I've loaded my file as a pivot chart and I've been working on setting up the tables I need.

I've realised that although in some instances I need to see the AC1, AC2 and AC3 values seperated in charts, I also need to see them combined.  For example, the sum of all values for AC1, AC2 & AC3's "Energy (kWh)" per month.

Would I need to edit the query and add a new column called AC Total kWh?  If so, how would I make that new column sum together only the Energy (kwh) values that relate to AC1, AC2 & AC3?  I would need to run the same formula but for PV1, 2 &3 as well.

Thanks again.

AnitaColumns-to-sum-data.PNG

 
Posted : 26/09/2022 12:46 am
(@mynda)
Posts: 4762
Member Admin
 

Hi Anita,

If you want a chart that includes AC1, AC2 and AC3, you simply select them in the PivotTable filter for the Channel Name. i.e. put the channel name in the Filters area of the Pivot Chart. Likewise, if you want PV1 etc.

Mynda

 
Posted : 26/09/2022 5:43 am
(@apay77)
Posts: 15
Eminent Member
Topic starter
 

Hi Mynda,

Yes I did that however I wanted to see AC1,AC2 and AC3 sumed as one value per month instead of 3 seperate lines on the graph.  I found an option to Group the columns in the pivot table which did this for me.  Is that an acceptable way to do it?

Another question for you (sorry)...

If I wanted to find the max value of a column for a particular month, is there a way to create a table that not only finds the monthly max value but also tells you the date and time it occured?  Would this be done with a Vlookup added to the Pivot table?

 
Posted : 27/09/2022 1:42 am
(@mynda)
Posts: 4762
Member Admin
 

No need to create a group. Simply put the channel name in the filters area and use the drop down on the PivotTable to check the boxes for the items you want summed. 

You would have to find the MAX value and then do an INDEX & MATCH to find the corresponding date/time...assuming the max was unique. Note: in future please start a new thread for different questions. 

 
Posted : 27/09/2022 5:19 am
Share: