What's New in RapidMiner Studio 9.9.0?
Released:March 24th, 2021
The following sections describe what's new in RapidMiner Studio 9.9.0:
New Features
- Data is the central piece in any RapidMiner process. The way RapidMiner internally deals with data has fundamentally changed in this release with the new Data Core (codename Belt). Its new columnar table representation provides a quantum leap in processing speed and memory efficiency for RapidMiner processes. Multiple operators already use it internally and it becomes fully available now for extension developers to create fast and efficient operators.
- Added aSet Positive Value操作符for the new Data Core which can make nominal attributes binominal or change the positive value of binominal attributes
Enhancements
- Replaced theRename by Example Values操作符by a new and improved version
- Replaced theRename操作符by a new one that can additionally handle a renaming dictionary
- Replaced theSort操作符by one that can sort by multiple attributes (currently already part of the Operator Toolbox extension)
- Improved theFP-Growth操作符so that it only works with explicitly defined positive values (either via binominal attributes or the positive value parameter) for items in dummy coded columns
- Improved memory consumption ofCross Validationin certain circumstances
- The operatorsRead CSVandRead Excelwere improved to use the new data core
- Pivotnow supports Least and Mode aggregations for numerical attributes as well
- Annotatenow adds the annotations to the meta data as well
- Added warning when trying to run a process on an AI Hub with a lower feature version than the current Studio version
- Added a reason when displaying incompatible extensions in the dialog after startup to show why an extension failed to load. Details available via tooltip.
- Upgraded integrated Chromium to version 84
- Improved some metadata transformation w.r.t. nominal value sets
- The splashscreen no longer shows duplicate extension icons during startup if more than one copy of an extension is installed
- Visualizations now also support Least and Mode aggregations for numerical attributes
- Improved concurrent execution in some corner cases
- Deprecated theExchange Roles操作符
- Model viewer forGradient Boosted Treemodels now respects the Number format settings in Studio preferences
- Auto Model uses new clustering algorithms which no longer require one-hot encoding on the data set and therefore reduce the memory footprint for data sets with nominal columns with many values. As a result, users can no longer specify the minimum number of clusters in the X-Means case (automatic determination of the optimal number of clusters). The minimum is now fixed at 2.
- Time Series: Added the option toignore invalid valuesto theMoving Average Filter操作符: Invalid values (missing, positive and negative infinity are now ignored when calculating the filtered value
- This also results in valid values at the beginning and end of the filtered time series
- As theClassic Decompositionand theFunction and Seasonal Component Forecastare based on the Moving Average Filter, the also have now the "ignore invalid values" option
Bugfixes
- Fixed Data Table reading/writing when LFS light checkout is enabled
- Fixed a problem where an uncaught exception could go through when using date/time attributes with values in the far future/past
- Fixed an uncaught exception that could happen when the process run viaExecute Processfailed, the user opened it via the popup and ran it directly after fixing the problem
- Fixed wrong attribute weights forRandom Forestregression
- Fixed error in商店操作符when used after application of k-Means model
- Fixed issue that Save dialogs did not accept any selection if a wildcard (.*) filter was provided (e.g. forWrite Document)
- Fixed Pivot meta data column names not matching the real data
- Fixed missing text for the file restoring confirm dialog in projects
- Fixed an issue that could cause Studio startup to silently fail
- Fixed a possible error during startup w.r.t port preconditions on some operators
- Fixed a bug that could cause project creation to not show an error and appear to do nothing
- Removed check for preprocessing models in model deployments for custom models. This has been causing certain grouped models to fail if they contained models which have technically been not preprocessing models (e.g. PCA).
- Time Series: Fixed a bug for theLag操作符, which caused original data to be changed at preceding ports as well
- Time Series: Fixed some small errors in the description of two tutorial processes forSliding Window Validation
- Time Series: Fixed an error, which occurs in time-based windowing, when the end of the last window is equal to the last timestamp in the input data. This effects all windowing operators (Windowing,窗口过程,Forecast Validation,Sliding Window Validation).
- Cloud Connectivity: File browser now adds the correct path separator character on Windows, and resolves macros properly for AWS, Azure, and Google Cloud file operators
Development
New Data Core
- ExampleSetandExampleSetMetaDataare officially deprecated! From now on, any new operators should be built using Belt Tables (com.rapidminer.belt.table.Table). Obviously existing operators with ExampleSets will continue to work for the time being. See the following resources for help:
检索表/ ExampleSets现在IOTable from the non-legacy Repositories with TableMetaData as meta data. Something similar to the following will not work anymore:
IOObjectEntry dataEntry = dataLoc.locateData(); if (!ExampleSet.class.isAssignableFrom(dataEntry.getObjectClass())) { return false; } MetaData metaData = dataEntry.retrieveMetaData(); if (!(metaData instanceof ExampleSetMetaData)){ return false; } ... IOObject ioObject = dataEntry.retrieveData(null); if (!(ioObject instanceof ExampleSet)){ return false; } ExampleSet exampleSet = (ExampleSet) ioObject; ...
and should be replaced by
IOObjectEntry dataEntry = dataLoc.locateData(); if (!IODataTable.class.isAssignableFrom(dataEntry.getObjectClass())) { return false; } MetaData metaData = dataEntry.retrieveMetaData(); ExampleSetMetaData esMD = BeltConversionTools.asExampleSetMetaDataOrNull(metaData); if (esMD == null){ return false; } ... IOObject ioObject = dataEntry.retrieveData(null); ExampleSet exampleSet = BeltConversionTools.asExampleSetOrNull(ioObject); if (exampleSet == null){ return false; } ...
The MetaData at ports can now be TableMetaData. All meta data transformations will continue to work sincePort#getMetaData()automatically transformsTableMetaDatatoExampleSetMetaDatabut the method has been deprecated and should be replaced byPort#getMetaData(ExampleSetMetaData.class)orPort#getMetaDataAsOrNull(ExampleSetMetaData.class)which automatically converts to the desired class if possible. The new methods are analog to those for data, e.g.Port#getAnyDataOrNull(), which has been already deprecated in 9.4 and should be replaced byPort#getDataAsOrNull(ExampleSet.class)which automatically converts to the desired class if possible. While nothing has changed for the data methods at the ports, there are more operators now that deliverIOTableinstead ofExampleSetto ports with 9.9. The operatorsRead CSVandRead Excelwere improved to use the new data core; if you use the corresponding classesCSVExampleSourceorExcelExampleSourcein some shape or form, please useCSVTableSourceandExcelTableSourcein the future.
Extension logging I18N
Logging now also supports i18n! To do so, follow one of those steps:
- for a RapidMiner Extension: add aLogMessagesXYZ.propertiesnext to where your existingUserErrorMessages.propertiesetc files are. Only respected by Studio 9.9+, ignored for earlier Studio versions.
- when using the logging module, simply register yourLogMessagesXYZ.propertiesviacom.rapidminer.tools.I18N#registerLoggingBundle(ResourceBundle)