bulk scoring in rapidminer server
Hi everyone,
What is the best way to bulk score new records (100s of thousands originating from enterprise DB) using a deployed model (deployed via Deployment) in the Rapidminer server?
>I have tried using the web service, but it does not scale. The response time for a single record is around 3 seconds currently.
> There's no 'real-time' scoring requirement. It is a daily single bulk request.
> There's no 'real-time' scoring requirement. It is a daily single bulk request.
1
Best Answer
-
IngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University ProfessorPosts:1,751RM FounderHi,With the upcoming RM 9.6 version you can turn off explanations for predictions which slows down the scoring a lot. But for true bulk scoring a single row web service approach does not seem to be great anyway IMHO.如果你检查时的存储库文件夹d models, you will find a process called "score_set" which you can use as a blueprint. Make a copy of this and adapt it a bit (especially for the operator "Explain Prediction" turn on the parameter "only predictions" to speed things up!) and add a data source (reading from you DB) in the beginning. If you also want to add the monitoring, you may also want to add the operator MDMLogging to this (which is a bit more complicated - I suggest to deal with this last if everything else works and you want the logging...).Hope this helps,
Ingo
7
Answers
Thank you. I could re-purpose the "score_set" to "bulk-score" by
1. setting the "select which=1" for "Define Target" block as there shouldn't be a target column for prediction.
2. setting the "select which=1" for "Define ID" block as the training mode doesn't need an identifier (optional) and prediction needed one.
It would actually be great to have a standard "bulk-score" process auto-generated from the deployment.
Cheers,
Neel