Performance Classifier for Naive Bayes and Decision Tree -- getting an error
I have built a Naives Bayes and a Decision Tree model and have one column = label in the training data so I can predict the outcome -- connected the perofmrnace classification operator and keep getting an error that says InputSet does not have a label attribute. I set the column to label using the Set Role operator. What classifier should I be using -- or what do I need to do to the data?
Best Answer
-
rfuentealba Moderator, RapidMiner Certified Analyst, Member, University ProfessorPosts:568Unicorn
Ok, let's start from general to particular: when you measure performance, you basically want to know how many times your trained algorithm was able to find the truth. For this, you need labeled data as input for both theDecision Treeand theApply Modeloperators, as thePerformanceoperator you are using just reads two columns: one with alabelattribute and another with apredictionattribute. The thing is that you are not passing labeled data to theApply Modeloperator, it is telling you that it cannot measure performance.
Few days agoI wrote an answeron how to performSplit Validation,Cross Validationand the kind of validation you are trying to do, which I callDIY Validation.I believe that the entire thread is a good source of information for you. Since you are learning, you might want to experiment with both theSplit Validationand theCross Validationoperators to know what is the difference. Beware that these aresuper-operators, that can contain operators inside. Here is your process withSplit Validation:
<运营商激活= " true " class = "process" compatibility="8.2.000" expanded="true" name="Process">
<运营商激活= " true " class = "read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="238">
<参数键=“十一”值= " CUMULATIVEGPA.true.real.attribute"/>
<运营商激活= " true " class = "set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role (2)" width="90" x="246" y="238">
<运营商激活= " true " class = "read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="112" y="34">
<参数键=“十一”值= " CUMULATIVEGPA.true.real.attribute"/>
<运营商激活= " true " class = "set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<运营商激活= " true " class = "split_validation" compatibility="8.2.000" expanded="true" height="124" name="Validation" width="90" x="380" y="34">
<运营商激活= " true " class = "concurrency:parallel_decision_tree" compatibility="8.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="45" y="34"/>
<运营商激活= " true " class = "apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="112" y="34">
<运营商激活= " true " class = "performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
<连接from_port = "模型" to_op = "适用Model (2)" to_port="model"/>
<运营商激活= " true " class = "apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="514" y="238">If you don't have a lot of data (e.g. dozens of Mb), I recommend you to useCross Validation. Beware that it's not easy on the amount of RAM it consumes.
Another small issue: check yourSet Roleoperator connected to theDecision Tree. It assigns a label first on theParametersview and then does it again inside the list. Remove the one in the list, and everything will be fine.
Hope it helps,
2
Answers
Hello, Melissa:
Do you mind to share your XML process with us? That way we can see what is not working. If you need help with sharing XML processes,please read this article.
All the best,
Thanks! Please see below -- will this work?
<运营商激活= " true " class = "process" compatibility="8.2.000" expanded="true" name="Process">
<运营商激活= " true " class = "read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="187">
<参数键=“十一”值= " CUMULATIVEGPA.true.real.attribute"/>
<运营商激活= " true " class = "set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role (2)" width="90" x="313" y="187">
<运营商激活= " true " class = "read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="112" y="34">
<参数键=“十一”值= " CUMULATIVEGPA.true.real.attribute"/>
<运营商激活= " true " class = "set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<运营商激活= " true " class = "concurrency:parallel_decision_tree" compatibility="8.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="380" y="34">
<运营商激活= " true " class = "apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="581" y="136">
<运营商激活= " true " class = "performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="648" y="34">
Also getting the same error when I try to use the Deep Learning operator -- so must be something wrong input?
This helped alot -- I was going about it completley wrong. Had learned about cross and split validation a few weeks ago in the course I'm taking but hadn't put the pieces together moving forward to applying the models.
Thanks!