Loop through exampleset and identify a public holiday and then set all 24 hours to 1 in variable

lanemlanem Member, University ProfessorPosts:29Maven
edited August 2019 inHelp
Hi
I am endeavouring to solve this problem with preprocessing of a data set to create a new variable that flags all 24 hours of a day as a public holiday based on identifying a particular day flagged as a public holiday but in current data set this is only for first hour of the day but technically a public holiday should be flagged for all 24 hours of a day - this is in a data set measuring impact of a number of variables on traffic volumes. Can I use a loop operator and macro to do this??? Any help or advice would be most welcome
Regards Michael

Best Answers

  • lanemlanem Member, University ProfessorPosts:29Maven
    Solution Accepted
    Hi Balázs
    I am using date to nominal to reformat the date_time to date format MM/dd/yyyy and then I am using the Filter Examples operator to get the dates with days that are flagged as public holidays then as you suggest I am using join operator and doing left join on the date_time with full data set with date_time also reformatted to date format MM/dd/yyyy see rapidminer process screen capture now I am getting all dates across 24 hours for each holiday but I am not getting the full data set with other days listed as None in holiday variable regards Michael
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949Unicorn
    Solution Accepted
    Hi,

    你发送过滤列表的“左”put and the unfiltered one to the "right" input. But you're doing left outer join.
    For this setup you need a right join. This keeps everything from the right input and the matching data from the left input.

    Check out this short video on Academy:
    https://academy.www.kenlockard.com/learn/video/joining-and-cleansing-intro

    Regards,
    Balázs
    lanem

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949Unicorn
    Hi,

    how are your data structured? Do you have a list of public holidays (dates) and the timestamps (on hour level)?

    You could try something like this in Generate Attributes to create an additional attribute with just the date part of your timestamp:
    date_add(timestamp_attribute,  - date_get(timestamp_attribute, DATE_UNIT_HOUR),  DATE_UNIT_HOUR)

    Best regards,

    Balázs
    varunm1 Tghadially
  • lanemlanem Member, University ProfessorPosts:29Maven
    Hi Balazs
    Many thanks the data set has a date-time variable so yes can extract hour of day for days that are flagged as public holidays - holiday variable is either "None" or the name of the public holiday such as "Christmas Day" so it is a polynominal variable in RapidMiner

    However the problem I have if that data set only flags the first hour of each public holiday so the other 23 hours for that holiday are incorrectly flagged as "None" so using the example above the first hour of Christmas day 25/12/2012 is flagged as "Christmas Day" the problem is that other 23 hours are not currently flagged as "Christmas Day" but rather "None".

    I am thinking of using an if statement to identify a public holiday such as != "None" and use a macro and loop to set the other 23 hours to "Christmas Day" and run through the entire exampleset for each holiday could work, see snapshot of data set below

    没有一个 275.91 0 0 64 Clouds broken clouds 10/11/2013 23:00 925
    Veterans Day 275.44 0 0 64 Clouds broken clouds 11/11/2013 0:00 514
    没有一个 275.55 0 0 64 Rain light rain 11/11/2013 1:00 286
    没有一个 274.91 0 0 64 Clouds broken clouds 11/11/2013 2:00 215
    没有一个 273.9 0 0 64 light snow 11/11/2013 3:00 305
    没有一个 271.36 0 0 64 Clouds broken clouds 11/11/2013 4:00 783
    没有一个 270.48 0 0 64 Clouds broken clouds 11/11/2013 5:00 2576
    没有一个 269.56 0 0 64 Clouds broken clouds 11/11/2013 6:00 5669
    没有一个 268.54 0 0 64 Clouds broken clouds 11/11/2013 7:00 6170
    没有一个 267.41 0 0 64 Clouds broken clouds 11/11/2013 8:00 5510
    没有一个 266.85 0 0 64 Clouds broken clouds 11/11/2013 9:00 4653
    没有一个 267.11 0 0 64 Clouds broken clouds 11/11/2013 10:00 4167
    没有一个 268.15 0 0 40 Clouds scattered clouds 11/11/2013 11:00 4200
    没有一个 269.27 0 0 64 Clouds broken clouds 11/11/2013 13:00 4678
    没有一个 269.71 0 0 64 Clouds broken clouds 11/11/2013 14:00 4971
    没有一个 269.2 0 0 64 Clouds broken clouds 11/11/2013 15:00 5101
    没有一个 268.68 0 0 64 Clouds broken clouds 11/11/2013 16:00 5476
    没有一个 267.63 0 0 64 Clouds broken clouds 11/11/2013 17:00 5346
    没有一个 266.68 0 0 64 Clouds broken clouds 11/11/2013 18:00 3982
    没有一个 265.77 0 0 64 Clouds broken clouds 11/11/2013 19:00 2679
    没有一个 265.14 0 0 64 Clouds broken clouds 11/11/2013 20:00 2301
    没有一个 264.93 0 0 64 Clouds broken clouds 11/11/2013 21:00 1991
    没有一个 264.44 0 0 64 Clouds broken clouds 11/11/2013 22:00 1369

    Regards Michael
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949Unicorn
    Hi,

    if you have the large table (the one with the timestamps) and create the date attribute, it will have 24 entries for each day, with the date.

    Then you have a second table, I assume, with the holiday dates.

    In this case you can simply join the two tables using the Join operator, selecting the date in both tables as the join attribute. This will assign the holiday to the date/time entries. Use "left join" to keep all entries of the timestamp table, then fill missing values in the resulting table with "None".

    Regards,

    Balázs
    Tghadially lanem
  • lanemlanem Member, University ProfessorPosts:29Maven
    Hi Balazs many thanks this sorted out my problem although I still needed to remove duplicate attributes and replace ? with None after watching the intro video on cleansing data I also used the trim operator and remove duplicates operator really appreciate your help with this regards Michael
    Tghadially
Sign InorRegisterto comment.