Categories

Versions

Using the Azure Blob Storage Connector

The Azure Blob Storage Connector allows you to access your Azure Blob Storage directly from RapidMiner Studio. Bothreadandwriteoperations are supported. This document will walk you through how to:

Connect to your Azure Blob Storage account

To configure a new Azure Blob Storage Connection you will need the connection details of your Azure Blob Storage account (at least the access key and the secret key).

  1. In RapidMiner Studio, right-click on the repository you want to store your Azure Blob Storage Connection in and chooseNew Connection IconCreate Connection.

    You can also click onConnections > Create ConnectionNew Connection Iconand select therepositoryfrom the dropdown of the following dialog.

  2. Give a name to the new Connection, and setConnection TypetoAzure Blob Storage IconAzure Blob Storage:

  3. Click onCreate IconCreateand switch to theSetuptab in theEdit connectiondialog.

  4. Fill in the connection details of your Azure Blob Storage account:

    While not required, we recommend testing your new Azure Blob Storage Connection by clicking theConnection Test IconTest connectionbutton. If the test fails, please check whether the details are correct.

  5. ClickSave IconSaveto save your Connection and close theEdit connectiondialog. You can now start using the Azure Blob Storage operators!

Read from Azure Blob Storage

TheRead Azure Blob Storageoperator reads data from your Azure Blob Storage account. The operator can be used to load arbitrary file formats, since it only downloads and does not process the files. To process the files, you will need to use additional operators such asRead Document,Read Excel, orRead XML.

Let us start with reading a simple log file from Azure Blob Storage.

  1. Drag aRead Azure Blob Storageoperator into theProcess Panel. Select your Azure Blob Storage Connection for theconnection entryparameter from the Connections folder of the repository you stored it in by clicking on therepository chooser iconbutton next to it:

    Alternatively, you can drag the Azure Blob Storage Connection from the repository into theProcess Paneland connect the resulting operator with theRead Azure Blob Storage操作符。

  2. Click on thefilechooser buttonfile chooser iconto view the files in your Azure Blob Storage account. Select the file that you want to load and clickFile Chooser IconOpen.

    As mentioned above, theRead Azure Blob Storageoperator does not process the contents of the specified file. In our example, we have chosen a log file (a plain text file). This file type can be processed via theRead Documentoperator which is part of theText Processingextension for RapidMiner Studio.

  3. If you have not already installed theText Processingextension for RapidMiner Studio, please go to the marketplace and do so now. Then add aRead Documentoperator between theRead Azure Blob Storageoperator and the result port:

  4. RunRun Processthe process! In theResultsperspective, you should see a single document containing the content of the log file.

You could now use further text processing operators to work with this document, e.g., to determine the commonness of certain events. To write results back to Azure Blob Storage, you can use theWrite Azure Blob Storage操作符。It uses the same Connection Type as theRead Azure Blob Storageoperator and has a similar interface. You can alsoread from a set of filesin an Azure Blob Storage directory, using theLoop Azure Blob Storage操作符。对于这一点,你需要pecify theconnection entryand thefolderthat you want to process, as well the steps of the processing loop with nested operators. For more details please read the help of theLoop Azure Blob Storage操作符。