feature weightage vs domain inputs.
hi all,
When I am trying to use 'explain predictions' - it comes out with various weightage of features which varies with selection of algorithm as well.
For eg: going for kNN - will choose feature A, feature B, feature C, feature D as top 3.
1. However my domain knowledge says feature D is the most important one. in that case
selection of kNN ( for which feature D is not important ) will do the job even if it gives good accuracy during training and testing?
2. or in the above scenario - should I go for model say: SVM - which naturally consider feature D as most important attribute ? , but the performance of SVM is less comparin with kNN for the given data set during training
and testing.
can I have some clarity on how to approach.. particularly when there is conflict in order of preference by weightage sugessted by explain prediction operator while comparing with domain inputs. thanks.
regards
thiru
When I am trying to use 'explain predictions' - it comes out with various weightage of features which varies with selection of algorithm as well.
For eg: going for kNN - will choose feature A, feature B, feature C, feature D as top 3.
1. However my domain knowledge says feature D is the most important one. in that case
selection of kNN ( for which feature D is not important ) will do the job even if it gives good accuracy during training and testing?
2. or in the above scenario - should I go for model say: SVM - which naturally consider feature D as most important attribute ? , but the performance of SVM is less comparin with kNN for the given data set during training
and testing.
can I have some clarity on how to approach.. particularly when there is conflict in order of preference by weightage sugessted by explain prediction operator while comparing with domain inputs. thanks.
regards
thiru
0
Answers
Explain Predictions and feature weighting are diagnostic tools, and your models are tools to achieve your goal, too. Don't overestimate the precision of Explain Predictions and feature weights, a complex model will have complex interactions between attributes.
Is it easy to or hard to get all the features at the same time without missing values? Are you interested in accuracy or in an explainable model? Might your attributes have some potential for discriminating against people? And so on.
Sometimes our domain knowledge betrays us or it is just too simplistic. That's why we use machine learning. A, B, C probably contain additional knowledge and they help improve the model beyond just looking at D.
All this said: Use the model (after proper validation) that solves your problem best, however the problem is defined.
Regards,
Balázs
Dortmund, Germany