Inconsistency of ROC curves

bernardo_pagnonbernardo_pagnon Member, University ProfessorPosts:60University Professor
edited March 2020 inHelp
Hello,

I generated a ROC curve for a logistic regression with a data set by using the performance operator, then clicking on criterion and AUC. Fine.

Then I used the same data set and use the Compare ROCs operator, picking logistic regression and decision tree as models. The ROC curves appear, and the ROC curve for the logistic regression is different from the one I obtained before! How can this be?

Best,
Bernardo
Tagged:

Best Answer

  • bernardo_pagnonbernardo_pagnon Member, University ProfessorPosts:60University Professor
    Solution Accepted
    What can I say?
    1 - Big ,big thanks
    2 - I was indeed training and testing on the same data to illustrate that one should not do that (it is for a class)
    3 - Great idea of putting 1-fold to be able to compare both cases.

    Best,
    Bernardo
    varunm1

Answers

  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    edited March 2020
    Hello@bernardo_pagnon

    Can you share the process here? You can download it using FILE --> Export process and attach .rmp file here. Please also attach the data. I suspect change in some samples of test data. Are you using same type of validation for both compare ROC and regular model with performance metric and with a random seed? I will check and let you know if provided with details of the process and data.

    If you cant share it here, you can send me a PM with requested files
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • bernardo_pagnonbernardo_pagnon Member, University ProfessorPosts:60University Professor
    There it goes!
    [Deleted User]
  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    edited March 2020
    Hello@bernardo_pagnon

    Thanks for sharing your process files and data.

    I used the complete datasheet in the excel file attached, I believe its the correct file. Now coming to the problem.

    Case 1 Process: In the case-1 process, I can see that you are training and testing on the same data. This is is not correct as you need to test on data that is independent of training data. If you are purposefully doing this for your requirement then it's fine.


    Case-2 Process: In case 2, you were using compare ROC operator. Based on the parameter settings as shown below, it uses 10 fold cross-validation that divided your dataset into 10 subsets and train on 9 subsets and test on 1 subset. This will happen until all subsets were tested and final performance is an aggregate of performance from all subsets.



    This is the reason you are getting different ROC curves. As your test data are different and processes are different in both cases the results (AUC and ROC) are different.

    I modified your case 1 to 10 fold cross-validationand now you can see in below image that the ROC curves of case 1 and case 2 are similar. The left side is for case 1 and the right side is for case 2. I attached the modifed process, you can open them in your rapidminer using FILE --> Import Process.



    Modified Case 1 process image: Added 10 fold cross-validation with Local random seed in parameters. I also added local random seed for compare ROC operator in Case 2 process with roc bias set to neutral


    Hope this helps. Please let us know if you need more information
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    bernardo_pagnon
  • varunm1varunm1 Moderator, MemberPosts:1,207Unicorn
    @bernardo_pagnonI got 100 points in this assignment in your class then笑脸: :wink:
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    sgenzer
  • bernardo_pagnonbernardo_pagnon Member, University ProfessorPosts:60University Professor
    ahahahahahaha
    Can't argue with that!!!

    Best,
    Bernardo
    [Deleted User]
Sign InorRegisterto comment.