Categories

Versions

What's New in RapidMiner Studio 9.7.0?

Released:June 2nd, 2020

The following describes the bug fixes in RapidMiner Studio 9.7.0:

New Features

  • 添加版本与RapidMi项目ner Server. You can have as many versioned projects as you like, no limits! The versioning is backed by Git and can be accessed by any regular Git clients. This means sharing between Python/R coders and RapidMiner users has never been easier!
    • Added dialog to select which version of a file to keep in case of a conflict in the versioned projects while getting Snapshots from Server.Versioning happens on a project level. As you can now have as many projects as you like, this is the most sensible behavior because most of the time many entries are interconnected in a project. Thus the entire state is saved and can be later restored, without having to worry about dependency versions.
    • Projects support ALL files you may have on your computer! You can put your .py scripts, your .md files, your .png files, your .pdf files, etc all into a project. It will be neatly displayed in RapidMiner Studio.
    • Of course, all those files can be versioned together, so RapidMiner users and Python coders can share the same git repository. The Python coders can even use their native Git client to do so, no magic required. This will make collaboration between RapidMiner users and Python coders easier than ever before!
    • Processes in versioned projects can also be run and scheduled on RapidMiner Server as they can for an existing Server central repository
    • All the files live locally on your computer, but are also shared via Git. This gives you the performance of a local repository when working with it during prototyping, but also allows for easy collaboration with your colleagues.
  • Added new panel "Snapshot History" which allows to browse the history of your versioned projects, as well as see the changes you've made since the latest snapshot. It can also be used to restore an earlier state of the project, view past versions of individual files, and to restore those past versions.
  • ExampleSetsare now written to disk in a new file format: HDF5. This is a well-established format used e.g. by the NASA to store large amounts of data. This also means that Python and RapidMiner Studio can exchange data via HDF5 files much more easily and faster than ever before.
  • Local repositories that will be created with RapidMiner Studio 9.7 or later can also take advantage of supporting all files you may have on your computer (.py, .jpeg, .pdf, etc).
  • New operatorTarget Encodingwhich can remove nominal attributes with too many values and performs a target encoding (also known as mean encoding) on the remaining attributes
  • Auto Model: some processes (e.g. SVM, FLM, or weight calculations) now use the new Target Encoding instead of one-hot encoding which reduces memory usage and run times
  • Time Series: New operator集成to integrate time series with different methods (cumulative sum / left and right riemann sum / trapezoidal rule)

Enhancements

  • Both local repositories and versioned projects (tied to RM Server) have been completely rebuilt to get rid of many old limitations. Benefits include:
    • Enhanced throughput and performance
    • Better meta data caching
    • Concurrent access support
    • Displaying all files (no matter what they are, e.g. Python scripts, images, ...)
    • Allowing different file types (e.g. data, processes) and folders to share the same name
    • Note: Your existing local repositories have (Legacy) after their name, indicating they still run on the old technology and still have some of the limitations! If you create a new local repository, it will have (Local) after its name and have all the capabilities listed above. You can copy your data over via Studio from the old repository to a new one to migrate.
  • It is now possible to have a folder with the same name as a data entry in the repository (might not work for some old repositories)
  • It is now possible to have a process and a data entry with the same name in the repository (might not work for some old repositories)
  • ReplacedSend Mailoperator with new version which supports file attachments
  • Improved memory usage forandPivotoperators for nominal columns with potentially a lot of unused values
  • Improved dealing with whitespaces in repository entry names
  • Improved cleanup of temp files, to reduce disk space clutter when Studio runs for a long time, i.e. in a Server environment
  • Made log tables in Result View behave more like other results, adding more actions and a shortcut to the context menu
  • Process background images are now using a relative path to the image if possible, instead of an absolute path. This only applies for background images set from now on, it does not work retroactively
  • For binominal attributes the Statistics tab shows the positive and the negative value
  • Renamed RapidMiner Server to RapidMiner AI Hub
  • Opening/Moving the Process panel into the foreground when opening a process while in the Design view to make it more obvious something happened
  • Auto Model: remote executions on Server require the central repository as storage location
  • Turbo Prep: only local file based repositories can now be used as temporary repositories for the handover to Auto Model
  • Model Ops: only local repositories or central Server repositories can be used as storage locations for deployed models (also known as "deployment location")
  • Model Ops: keep unused and ID columns in the results after scoring
  • The operatorsExplain PredictionsandModel Simulatornow also support grouped models where arbitrary models have been grouped instead of only preprocessing models
  • The operatorExplain Predictionsnow offers a parameter to limit the number of important features also for the "importances" output
  • Time Series
    • Added options to use padding forFast Fourier Transformationand calculate the frequency of the amplitude value.
    • Added the option to specify negative lags for theLagoperator
    • Added the option to specify a default lag for a set of attributes (selected by an attribute subset selector) to the Lag operator
      • Unfortunately due to parameter key incompatibilities, old version of the Lag operator is deprecated and new version with the same name, but different operator key is added.
  • H2O
    • Updated H2O library to version 3.30.0.1.
    • Added monotonicity constraints to Gradient Boosted Trees
    • Added weights port to Deep Learning
    • Expanded whitelist of accepted expert parameters, now supports all parameters provided by H2O
    • Deep Learning and Logistic Regression now work with datasets that have nominal columns with only one value

Bugfixes

  • Fixed an issue that could cause Studio startup to never complete
  • Made Studio startup more rigid to quit process instead of silently hanging on the splash screen forever
  • Fixed issue that could cause panels to sometimes not open if they had been closed previously in this session
  • Fixed an issue that caused CTAs not working when HTML5 safe mode was enabled
  • Fixed an issue with back propagation of changes to performance vectors
  • Fixed a problem for JDBC drivers that do not implement a certain set of functionality by adding a fallback (e.g. SQLite writing)
  • Fixed potential cause for complete UI freeze when interacting with a CTA notification banner
  • Fixed an issue with process navigation and property panel if operator names contain HTML
  • Generate Multi-Label Data does now correctly work in non-regression mode
  • Fixed memory leak caused by the Visualizations
  • Fixed rare issue where data sets could not be downsampled automatically if license limit was exceeded
  • Fixed an issue inAutomatic Feature Engineering如果所有的输入特性nominal in the feature selection case
  • Fixed "Edit Access Rights" dialog for Server repositories not getting the permissions correctly when using Enterprise SSO
  • Fixed an issue that caused Studio to lag and increase memory consumption when using the right-click "Insert operator" popup menu in the Process panel.
  • Fixed broken replacing (instead it was duplicated) on move of data entries to a different repository
  • Auto Model: remote executions show new submission screens now which only allows the reset of Auto Model to load the results which avoids problems with multiple remote submissions within the same session
  • Auto Model: reordering the columns in the column selection table no longer lead to graphics problems
  • Time Series: Fixed a bug inExtract Peaks, that causes all "_position" features to have an offset of 1 to the Example number

Known issues

  • One Hot Encodingdoes not produce the desired results, this will be fixed with the next patch release.

Special notes

  • Columns of type "Integer" that were previously stored as integers are now stored as their double representation. This of course means more range (~53 bit precision), but also means that values are no longer capped. This might have an impact when storing data to disk and rereading it.
  • Columns of type "Date" no longer store the milliseconds due to the new file format. This might have an impact of equality tests and matching when storing data to disk and rereading it.
  • Visualizations that have been created locally for data sets stored in repositories will not be found anymore after the update, causing the result visualization to reset to its default. If you have set up complex visualizations that you absolutely want to restore, you can follow these steps:
    1. Open the data set in the Results view of RapidMiner Studio.
    2. Navigate on your disk via your filesystem explorer into the "USER_HOME/.RapidMiner/internal cache/content mapper" folder. There you can find a folder structure matching your repository names and structure.
    3. Find the exact path to the data set (e.g. "C:/Users/xyz/.RapidMiner/internal cache/content mapper/Local Repository/Charts/Demo/12. Pie")
    4. You should see a very similar path right next to it, either ending in ".ioo" or ".rmhdf5table" (e.g. "C:/Users/xyz/.RapidMiner/internal cache/content mapper/Local Repository/Charts/Demo/12. Pie.ioo")
    5. Go into the folder from step 3 (the one without the .ioo ending), and copy the "pc.json" file from it to the folder from step 4 (the one with the .ioo ending)
    6. Close the data set in the Results view
    7. Open it again. It should now have its configuration back!

Development

The introduction of versioned projects (backed by Git) have forced a major redesign of the Repository API. Up until 9.7, a RepositoryLocation was represented by a string like "//RepositoryName/folder/test" and "test" was guaranteed to be unique. It was either a folder, a process, an ioobject (data) object, or a blob.This is no longer the case!

Since collaboration with Git can introduce naming conflicts which are not actually file-level conflicts (so Git is fine with them), we had to allow these "non-conflicts" into the Repository world as well.

Now a repository location that ends with "test" as the last path element can either depict a folder (RepositoryLocationType#FOLDER), or data (RepositoryLocationType#DATA_ENTRY). Sometimes this is unknown, which is also fine:RepositoryLocationType#UNKNOWNcan be used in that case. However, it does not stop there. Since for Git, "test.rmp" and "test.ioo" are also perfectly fine, we had to go one step further and also allow that. Therefore, a RepositoryLocation now also has an expected DataEntry (sub-)type which is used to determine what specific type of a DataEntry to locate (aProcessEntry, anIOObjectEntry, aConnectionEntry, or aBinaryEntry).

You can even end up in the undesirable situation of having a "test.ioo" and a "test.rmhdf5table" (both IOObjects) in the same location. Because we cannot determine which IOObject a process should potentially use, these situations must be rectified by the user - theRetrieveoperator will throw an error in that case! Looking at the data and renaming one of the entries will work fine, though. This scenario can only happen after a Git pull with the new versioned projects.

In other words, "test" can in our example now be a folder, a process, a data ioobject, a connection entry, or a binary entry. And they can all exist at the very same time in the very same folder. So be sure to specify in the new RepositoryLocationBuilder what exactly you want from the repository, or you may end up getting the first name match it finds, which may be of an unexpected type.

  • Repositories now distinguish between data and folders, and even between different data subtypes (process, ioobject, connection, binary entry) which means you can have a folder called "A" and e.g. a process called "A" at the same time. This has implications for a large number of APIs, most notably:
    • com.rapidminer.repository.Repositoryinterface:
      • locateFolder(String)andlocateData(String, Class )have been added and can be implemented, their default implementation points to theRepositoryManager()#locateFolder(String)andlocateData(String, Classmethods
      • getIOObjectEntrySubtype(Class ioObjectClass)has been added and can be implemented, the default implementation returns IOObjectEntry.class. This is used for the new file-based repository implementations (Local and versioned Project) that will ultimately have different file suffixes on disk for every distinct IOObject type (instead of all of them sharing the legacy .ioo suffix)
      • isTransient()has been added, defaults to false. This is used to hide temporary repositories from the repositories panel and from the Global Search iftrue.
      • locate(String)has been deprecated and should not be used anymore because it cannot know whether a file or a folder is requested
    • com.rapidminer.repository.RepositoryManagerclass:
      • locate(Repository, String, boolean)has been deprecated and replaced with
      • locateFolder(Repository, String, boolean)andlocateData(Repository, String, Class
    • com.rapidminer.repository.Folderinterface:
      • containsFolder(String)andcontainsData(String, Classhave been added and must be implemented
      • containsEntry(String)has been deprecated and should not be used anymore because it cannot know whether a file or a folder is requested
      • canRefreshChildFolder(String)andcanRefreshChildData(String)have been added and must be implemented
      • canRefreshChild(String)has been deprecated and should not be used anymore because it cannot know whether a file or a folder is requested
    • com.rapidminer.operator.Operatorclass:
      • getParameterAsRepositoryLocation(String)has been deprecated and should not be used anymore because it cannot know whether a file or a folder is requested
      • getParameterAsRepositoryLocationData(String, Class)has been added for looking for data
      • getParameterAsRepositoryLocationFolder(String)has been added for looking for folders
    • com.rapidminer.repository.RepositoryLocationclass:
      • locateEntry()has been deprecated and replaced withlocateFolder()andlocateData()(same as above)
      • ALL constructors have been deprecated and replaced with a builder:com.rapidminer.repository.RepositoryLocationBuilder
      • getRepositoryLocation(String, Operator)has been deprecated and replaced withgetRepositoryLocationFolder(String, Operator)andgetRepositoryLocationData(String, Operator, Class)
      • addedgetLocationType()andsetLocationType(RepositoryLocationType)which are used to specify whether a RepositoryLocation references a folder, a data entry, or that it is not know what it references
      • addedgetExpectedDataEntryType()andsetExpectedDataEntryType(Class)which are used to specify what data entry (sub-)type is expected. Not used ifRepositoryLocationType#FOLDERis expected.
      • addedisFailIfDuplicateIOObjectExists()andsetFailIfDuplicateIOObjectExists(boolean), which control whether aRepositoryIOObjectEntryDuplicateFoundExceptionis thrown whenlocateData()is called (an IOObjectEntry is requested), but there are at least two IOObject entry subtypes with the same name (prefix). As this is an undesirable situation, operators will refuse to work with such locations when retrieving data.
      • These changes are very important to adapt to, otherwise you can end up for example getting a folder when expecting a file, or a process when expecting an IOObject!
  • ParameterTypeRepositoryLocationnow has a new getter and setter for a predicate to limit the available UI choices for the user when selecting entries. It is used in theRepositoryLocationValueCellEditorifgetRepositoryFilter()is not overwritten in it. Note that the operator still has to check the validity of the repository location for its use case, the filter is purely for UI purposes and does not validate the returned value.
    • setRepositoryFilter(Predicate )
    • getRepositoryFilter()
  • Added secure encryption framework to Studio, based on Google Tink. Seecom.rapidminer.tools.encryption.EncryptionProvider的一个起点。旧的CipherToolshave been deprecated andmust not to be usedfor new encryptions anymore!
    • This means that any access methods to process XML and connections have been deprecated and replaced with a version where you can specify the desired encryption context! Failure to use these new methods may lead to decryption failure of encrypted values in connections, and ParameterTypePassword in processes!
    • SeeRepository#getEncryptionContext()for getting the encryption context for a repository. The default implementation uses theEncryptionProvider.DEFAULT_CONTEXTwhich is used by all local repositories. Implement this method if you need a custom encryption key for each of your repository instances.
    • All Process constructors have been deprecated and replaced by a version that takes an encryption context String identifier:
      • Process(String)has been deprecated and replaced byProcess(String, String)
      • Process(File, ProgressListener)has been deprecated and replaced byProcess(File, String, ProgressListener)
      • Process(Reader)has been deprecated and replaced byProcess(Reader, String)
      • Process(InputStream)has been deprecated and replaced byProcess(InputStream, String)
      • Process(URL)has been deprecated and replaced byProcess(URL, String)
  • ParameterTypePassswordandParameterTypeOAuthhave been deprecated and在运营商必须永远不会被再次使用!Use the connection framework introduced in version 9.3 instead to avoid having sensitive values in process XML.
  • com.rapidminer.repository.BlobEntryhas been deprecated. All new repositories must support the newcom.rapidminer.repository.BinaryEntryinstead. That is a direct view onto binary content that is not interpreted in any way, shape or form. No magic bytes, nothing. Just like it would be on the filesystem.
  • Addedcom.rapidminer.repository.gui.BinaryEntryResultRendererRegistry注册自定义渲染器来显示的Results view when the new BinaryEntry is passed as a process result.
    • Multiple renderers can be registered per suffix
    • Depends on the suffix of a file, e.g. jpeg
  • Addedcom.rapidminer.gui.dnd.DropBinaryEntryIntoProcessActionRegistryfor registering custom hooks when a user drags the new BinaryEntry into the canvas.
    • Can create an operator or trigger any custom action
    • Depends on the suffix of a file, e.g. py
  • Addedcom.rapidminer.gui.dnd.DropFileIntoProcessActionRegistryfor registering custom hooks when a user drags a binary file from disk into the canvas.
    • Can create an operator or trigger any custom action
    • Depends on the suffix of a file, e.g. py
  • Addedcom.rapidminer.repository.gui.OpenBinaryEntryActionRegistryfor registering custom hooks when the user double-clicks or otherwise opens the new BinaryEntry
    • Depends on the suffix of a file, e.g. py
  • Addedcom.rapidminer.repository.gui.BinaryEntryIconRegistryfor registering custom icons for each binary entry suffix which are shown in the repository panel
    • Depends on the suffix of a file, e.g. py
  • Refactored thecom.rapidminer.operator.ports.Portandcom.rapidminer.operator.ports.Portsinterfaces and sub-interfaces/-classes.Portis now a self referencing generic type, allowing more convenient and type-safe methods
    • There are only input and output ports, they are always opposite, so the types reflect that and the methodgetOpposite()returns a value of the opposite type;getSource()andgetDestination()are now present in both subclasses and might return themselves
    • Connecting and disconnecting can now be done by either side of a connection; there also is a methodcanConnectTothat checks if two ports can be connected
    • Ports and its implementations were updated to reflect the generic nature of Port. Custom implementations should not be affected at runtime, but might need a small adjustment in code to compile properly
  • Addedcom.rapidminer.gui.tools.ResourceDockKey#PROPERTY_KEY_NEXT_TO_DOCKABLEandcom.rapidminer.gui.tools.ResourceDockKey#PROPERTY_KEY_DEFAULT_FALLBACK_LOCATIONwhich can be used to define where a Dockable should be opened by default
  • Addedcom.rapidminer.TestUtilswhich contains utility methods for testing, e.g. a method to do the most barebones setup of RapidMiner required to at least create and use empty processes in unit tests
  • Removed quite a few deprecated methods and classes, which were deprecated for over 10 years at this point. This should have no impact on extensions, unless very old methods annotated with @Deprecated since RapidMiner 5 have been used.
  • Addedcom.rapidminer.tools.io.NotifyingOutputStreamWrapper, which is an OutputStream wrapper that can execute a runnable after (manual and automatic) close of the stream
  • Addedcom.rapidminer.tool.TempFileToolswith methods to create temporary files which are automatically removed duringRapidMiner#cleanup()andRapidMiner#shutdown(), always use these methods to ensure cleaning up of unused files.