ratio training/testing in sliding windows validation

maurits_freriks · January 2018

Hi all,

I'm already months begin struggling with a problem about prediction. Now after a few optimalization runs (takes days and days) I advanced (probably) with the problem overfitting?!

As you can see in the picture below the 6th column represent the performance of the model. The 3rd, 4th an 5th are the parameters of the sliding windows validation(training width, step width, testing width). Probably the ratio training vs testing is too high. But if I decrease the ratio the performance will decrease. So I don't know what the perfect ratio will be such that the perfromance is not suspect anymore.

So could anyone advice me what the ratio is in respect to my datasets:

https://drive.google.com/open?id=12XjPKw2diSLnc9-MtAv_--SVfntA3nR-

Screen Shot 2018-01-16 at 20.21.34.png

Below the XML code of the proces. I used the score object to combine these values against my test set in a score process.

Select the 'A' column

Lag 'A' column for striping out spikes

Calculate std dev of 'A', push to macro

<列出关键= " additional_macros " / >
extract std dev value to use in Generate Attributes

Create a Maintenance attribute to help filter out the days it's in maintenance mode

Select only non maintenance mode days

Select 'A' again

<操作符= " true " class = " support_vector_m激活achine" compatibility="7.6.001" expanded="true" height="124" name="SVM" width="90" x="112" y="34">

<参数键= " C " value = " 9000.0 " / >

< portSpacing port="source_training" spacing="0"/>
< portSpacing port="sink_model" spacing="0"/>
< portSpacing port="sink_through 1" spacing="0"/>

< portSpacing port="source_model" spacing="0"/>
< portSpacing port="source_test set" spacing="0"/>
< portSpacing port="source_through 1" spacing="0"/>
< portSpacing port="sink_averagable 1" spacing="0"/>
< portSpacing port="sink_averagable 2" spacing="0"/>

< portSpacing port="source_input 1" spacing="0"/>
< portSpacing port="source_input 2" spacing="0"/>
< portSpacing port="sink_performance" spacing="0"/>
< portSpacing port="sink_result 1" spacing="0"/>
< portSpacing port="sink_result 2" spacing="0"/>
< portSpacing port="sink_result 3" spacing="0"/>

Optimize and store optimized model

Store optimized model

Sanity Check. Review 'A' time series against predicted 'A' time series from training data set.

< portSpacing port="source_input 1" spacing="0"/>
< portSpacing port="sink_result 1" spacing="0"/>
< portSpacing port="sink_result 2" spacing="0"/>
< portSpacing port="sink_result 3" spacing="0"/>
< portSpacing port="sink_result 4" spacing="0"/>

Screen Shot 2018-01-16 at 23.20.38.png

lionelderkrikor · January 2018

Hi@maurits_freriks,

For the ratio, I would say training width = 0,7 / 0,8 and respectivly test width = 0,3 / 0,2 with increased absolute value of

test width (test width = 5 is too low from my opinion).

Alternatively, how said in PM, you can use theRMSE-performance(regression)operator - to measure in a more objective way the performance of your model(s).

Best regards,

Lionel

maurits_freriks · January 2018

@lionelderkrikor

You mean change the performance (forecasting performance) into performance (regression)?

lionelderkrikor · January 2018

Hi Maurits,

Exactly. The best model is the one that minimizes RMSE.

Best regards,

Lionel

Thomas_Ott · January 2018

Yep, RSME is definately another way to look at this. My main concern has been those spikes lower. Can they be removed or is there a specific reason that they must remain in?

maurits_freriks · January 2018

@Thomas_Ott

Sorry for the late reply.

Yes there is a specific reason why those spikes are in the dataset. Becaus this was the actual flow of the days in the past. The reason of this spikes is because of maintainance (planning) or tripping (Unpreditable). The final goal is to automatise the prediction process so you have to pay attention to those spikes. Now I do have a planning_dump where you could find what happends in the spikes.

@Thomas_OttCould I sent you a PM such that you could think about how to implet this into a Rapid Miner Process?

With kind regards,

Maurits Freriks

With kind regards

Thomas_Ott · January 2018

@maurits_freriksMy suggestion is to ask your question in the community. I'm very crunched for time this week and won't be able to look at anything.

Howdy, Stranger!

Quick Links

Categories

RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

ratio training/testing in sliding windows validation

Answers