Loop through exampleset and identify a public holiday and then set all 24 hours to 1 in variable
Hi
I am endeavouring to solve this problem with preprocessing of a data set to create a new variable that flags all 24 hours of a day as a public holiday based on identifying a particular day flagged as a public holiday but in current data set this is only for first hour of the day but technically a public holiday should be flagged for all 24 hours of a day - this is in a data set measuring impact of a number of variables on traffic volumes. Can I use a loop operator and macro to do this??? Any help or advice would be most welcome
Regards Michael
I am endeavouring to solve this problem with preprocessing of a data set to create a new variable that flags all 24 hours of a day as a public holiday based on identifying a particular day flagged as a public holiday but in current data set this is only for first hour of the day but technically a public holiday should be flagged for all 24 hours of a day - this is in a data set measuring impact of a number of variables on traffic volumes. Can I use a loop operator and macro to do this??? Any help or advice would be most welcome
Regards Michael
Tagged:
0
Best Answers
-
lanem Member, University ProfessorPosts:29MavenHi Balázs
I am using date to nominal to reformat the date_time to date format MM/dd/yyyy and then I am using the Filter Examples operator to get the dates with days that are flagged as public holidays then as you suggest I am using join operator and doing left join on the date_time with full data set with date_time also reformatted to date format MM/dd/yyyy see rapidminer process screen capture now I am getting all dates across 24 hours for each holiday but I am not getting the full data set with other days listed as None in holiday variable regards Michael0 -
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified ExpertPosts:949UnicornHi,
你发送过滤列表的“左”put and the unfiltered one to the "right" input. But you're doing left outer join.
For this setup you need a right join. This keeps everything from the right input and the matching data from the left input.
Check out this short video on Academy:
https://academy.www.kenlockard.com/learn/video/joining-and-cleansing-intro
Regards,
Balázs1
Answers
how are your data structured? Do you have a list of public holidays (dates) and the timestamps (on hour level)?
You could try something like this in Generate Attributes to create an additional attribute with just the date part of your timestamp:
Balázs
Many thanks the data set has a date-time variable so yes can extract hour of day for days that are flagged as public holidays - holiday variable is either "None" or the name of the public holiday such as "Christmas Day" so it is a polynominal variable in RapidMiner
However the problem I have if that data set only flags the first hour of each public holiday so the other 23 hours for that holiday are incorrectly flagged as "None" so using the example above the first hour of Christmas day 25/12/2012 is flagged as "Christmas Day" the problem is that other 23 hours are not currently flagged as "Christmas Day" but rather "None".
I am thinking of using an if statement to identify a public holiday such as != "None" and use a macro and loop to set the other 23 hours to "Christmas Day" and run through the entire exampleset for each holiday could work, see snapshot of data set below
Regards Michael
if you have the large table (the one with the timestamps) and create the date attribute, it will have 24 entries for each day, with the date.
Then you have a second table, I assume, with the holiday dates.
In this case you can simply join the two tables using the Join operator, selecting the date in both tables as the join attribute. This will assign the holiday to the date/time entries. Use "left join" to keep all entries of the timestamp table, then fill missing values in the resulting table with "None".
Regards,
Balázs