"SVM Regression returning same values for all test records ?!?!"
I setup a nice SVM using nu-svr in RM.
As a test I trained it on a a sparse data set containing 1000 records.
Then, I tested it against a new data set of about 14 records.
Every record of the test set returned the exact same prediction. This seems highly unlikely since there are over 140 dimensions to the SVM and a significant amount of variation in the data.
One guess is that maybe I'm not loading in the sparse data correctly for testing.
I can't seem to discover where my error is. Maybe someone here can offer some help/suggestions.
Here is the training XML
As a test I trained it on a a sparse data set containing 1000 records.
Then, I tested it against a new data set of about 14 records.
Every record of the test set returned the exact same prediction. This seems highly unlikely since there are over 140 dimensions to the SVM and a significant amount of variation in the data.
One guess is that maybe I'm not loading in the sparse data correctly for testing.
I can't seem to discover where my error is. Maybe someone here can offer some help/suggestions.
Here is the training XML
Here are 2 rows to training data
here is the test XML
0.99307958477511 1:2 2:12 3:0.982609455619486 4:0 5:14 6:5 7:0.8 8:0.0348258706467662 9:201 10:0.0496977837474815 11:1489 1
2:1 13:1 14:0.00477630731561417 15:133 16:10.81 17:5.5 101:1 116:1 117:1 119:1 125:1\
0.989655172413817 1:3 2:2握.973641810178274 4:0 5:63 6:3 7:1 8:0.0631443298969072 9:776 10:0.0769704433497537 11:1624 12:
1 13:0.5 14:0.0049596226732805 15:123 16:-0.09 17:6 101:1 116:1 117:1 119:1 125:1
here are 2 rows of test data
1:0 2:14 3:0.979392741314451 4:0.0909090909090909 5:28 6:22 7:0.227272727272727 8:0.0436046511627907 9:1376 10:0.0735090152
565881 11:1442 12:0 13:2 14:0.0104266852405951 15:133 16:9.64 17:8.09 103:1 116:1 117:1 119:1 125:1
1:0 2:1 3:0.980626115895827 4:0.0357142857142857 5:20 6:28 7:0.178571428571429 8:0.0338541666666667 9:768 10:0.065300896286
8118 11:781 12:0.321428571428571 13:0.2 14:0.0067155135256289 15:130 16:6.64 17:8.32 102:1 111:1 117:1 119:1 125:1
Tagged:
0
Answers
this process does not contain any obvious errors. (To cite my favorite error message)
Perhabs you only need to tune the SVM Parameters?
As a second hint: It is much more comfortable to use the build in validation operators instead of splitting the data manually and use two processes. You could use the XValidation, which is explained in the 04_Validation/03_XValidation_Numerical.xml sample in the sample directory.
To tune your SVM Parameters you could take a look at the 07_Meta/01_ParameterOptimization Sample.
Greetings,
Sebastian
I HAVE performed the parameter optimization and XV validation to build a good model.
What you are seeing in my earlier post is using the model on "real-world" data. This was an actual application of the SVM to learn something about unlabeled data.
My concern is this: If the XV during the training showed decent results, why would the SVM predict the exact same output for the REAL data?? It is possible but very highly unlikely.
-N
this is strange indeed. The attribute header is exactly the same as in the trainingsset?
没有数据我不能任何其他可能的形象error, since I cannot reproduce the behavior.
Greetings,
Sebastian
I think I found the problem. My data is a 2 class problem.
14% is class 1
86% is class 0
From what I've recently read, having "unbalanced" training sets can cause the SVM to develop a model that heavily favors the larger class. This would explain the results I've been seeing.
My question is: Is there a way to have RapidMiner weight the classes or account for the unbalanced training data?
Thanks,
-N
there is an operator called EqualLabelWeighting which will distribute an equal total weight on all classes. Hence exampes of a dominating class will be down weighted.
But you then will need a learner capable of using example weights. You should check this in the operators info of the learning operator.
Greetings,
Sebastian