Pipeline is a series of processes represented by blocks connected in sequence, with the output of one block serving as the input for the next. A typical pipeline includes blocks such as:
- Data source(s) to uploade files and/or connect to databases
- Data transformations to merge dataframes or generate new features
- Wand ML block to train AI models or to generate predictions and analytics with trained AI models
We support the following data sources:
- Files (that support CSV format)
- Databases (SnowFlake, PostgreSQL, CloudSQL, BigQuery)
You can drag and drop the source block and configure it with the left bar menu. “Upload File” for files and “Configure” for databases. We suggest you to try Wand with files: your imported CSV file first or get some case from Kaggle Data Sets.
Once the data is uploaded, you can review it in Sneakpeek page. The page shows all columns, their types and some sample data.
The configuration of external databases is unique to each one. Please input your credentials for Snowflake, PostgreSQL, CloudSQL, BigQuery to connect data to the Wand platform.
We support 2 types of transformations:
- Predefined Transformations – e.g. Join 2 Sources
- User-defined Transformations – using our SQL Console
If users want to connect several data sources into one block, they could use Join 2 Sources or add extra input ports to Raw SQL block and join them manually with code.
Raw SQL supports the SQL syntax. To connect all data sources into one transformation, please add extra input ports to the transformation block.
Note! Users have to refer in the SQL code to Connector names – not the DB name. To quickly copy/paste the Port name, please use the copy button on the Port block on the right.
Predefined Join 2 sources allows users to join 2 Data Sources. Just drag and drop it on the canvas and go to Configuration. Note! You can join sources once have connected data sources blocks with the Join block.
You can check the result of the transformation in the block menu with the Data Preview button.
Wand is adding more data sources and more transformations continuously.
Configuring of ML block
You can define which type of AI task and what prediction key to use in Wand ML block. It should be connected to some data source – to enable prediction key selector.
Types of AI tasks could be:
- Binary Classification – type of an AI task that predict only 2 options – 0 and 1 (or in other format yes / no).
- Multiclass classification – type of an AI task that predict classes like high, medium, low.
- Regression – type of an AI task that predict digits like price, balance, etc
Some datasets need to be trained for hours. If users want to get quick results – they can set a limit on the training time.