"confusion matrix in rapidminer for clustering"
Hi ...
In rapidminer, how can I compute the confusion matrix for the "clustering results" (assuming the actual classes are provided with the data, in order to evaluate the performance of a clustering algorithm, say k-medoid ?
Thanks.
In rapidminer, how can I compute the confusion matrix for the "clustering results" (assuming the actual classes are provided with the data, in order to evaluate the performance of a clustering algorithm, say k-medoid ?
Thanks.
Tagged:
0
Answers
Confusion matrix is not actually applicable to clustering, since its purpose to show difference between model predictions and actual value of target variable in supervised classification algorithms, while clustering is an unsupervised algorithm by its nature.
However, if you have data labelled with actual classes (or clusters) plus predicted class value (cluster value produced by a model), you can use PERFORMANCE (CLASSIFICATION) operator to generate confusion matrix.
Vladimir
http://whatthefraud.wtf
I am not sure the PERFORMANCE (CLASSIFICATION) could solve my issue (although some of its outputs are "weighted mean recall" and "weighted mean precision". This process as I think is for bi-classes.
How can I measure the clustering performance for multiclasses by the external validity indexes?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I'd add one concern here, technically you can actually use PERFORMANCE (CLASSIFICATION) operator on an arbitrary dataset, you only need to be sure that there's an attribute of type 'label', which indicates actual class, and another attribute of type 'prediction', which indicates model predicted class. If you already have a dataset representing this, you can use SET ROLE operator to define label and prediction columns respectively.
Vladimir
http://whatthefraud.wtf
Can you please provide me the processes needed in sequence along with its parameters setting.
Appreciate it ...
It could be easier to help you if you could share here actual dataset on which you want to produce confusion matrix and evaluate performance metrics.
Vladimir
http://whatthefraud.wtf