Accuracy – In a classification task, accuracy is a measure of how well a machine learning model is able to predict the class labels for a given set of data. It is calculated as the number of correct predictions made by the model, divided by the total number of predictions. For example, if a model is trained to predict whether an email is spam or not spam, and it makes 1000 predictions, with 950 of them being correct, then the model’s accuracy is 950/1000*100% = 95%. Accuracy can be a useful metric for evaluating the performance of a classification model, but it is important to note that it can be misleading in some cases. This is because it does not take into account the relative costs of different types of errors. For example, if a model is trained to predict whether a patient has a rare illness or not, it can frequently predict that a patient does not have the illness when they actually do have it, and still have high accuracy (because illness is a rare event).
AI model – an AI model is the output of the training environment run – The model learns patterns in your historical data in order to make predictions and provide additional analytics for your current and future data.
AI task – an AI (in particular, Machine Learning) task is a specific problem that a machine learning model is designed to solve. There are many different types of AI tasks, including Classification, Regression, Clustering, Dimensionality reduction, Anomaly detection, etc. Wand Hierarchy explained.
Binary Classification – a type of AI task that predicts only 2 options – 0 and 1 (or in other format yes / no).
Block – a block represents one of the possible components within a pipeline. When a pipeline is running, the data flows between blocks connected with each other via input and output ports.
Classification is a task in which the model is given a set of labeled data and is asked to predict the class label for a new, unseen sample. For example, a classifier might be trained to predict whether an email is spam or not spam. Classification is called Binary on the platform if there are exactly two classes to predict, one identified as positive and another as negative. The following pairs of labels (after converting to string and lower case) are identified as positive-negative: “true”/”false”, “yes”/”no”, “positive”/”negative”, “1”/”0″, “1.0”/”0.0″, “1.0”/”0″, “1”/”0.0″, “1”/”-1″, “1.0”/”-1.0″, “1”/”-1.0″, “1.0”/”-1″. Otherwise, classification is referred to as Multiclass.
Confidence – refers to the probability that a model’s prediction is correct. A model with high confidence in its predictions is more likely to be accurate, while a model with low confidence is less likely to be accurate. The Wand Platform provides confidence estimates for each prediction.
Cron – is part of an environment, which allows you to set up automatic runs of the environment at specified times.
Data connectors – a block in pipeline that allows users to connect databases such as Snowflake etc
Data Quality refers to the characteristics of the data that you use to train your models. The quality of the data significantly impacts the accuracy and performance of the model. Some factors that can affect the quality of data for machine learning include:
- Completeness: does the data contain all of the necessary information to address the problem at hand?
- Relevance: does the data relate to the problem you are trying to solve?
- Accuracy: is the data accurate and free from errors?
- Consistency: are the data sources consistent with each other?
- Timeliness: is the data current and relevant?
Deploy is the process of publishing elements of one environment (AI model, pipeline, settings) into another.
Environment is an entity that represents a specific step in the AI lifecycle. More about Wand Hierarchy
Experiment is an entity that represents all the steps in the AI lifecycle for one data schema. More about Wand Hierarchy
Explainability – in machine learning refers to the ability of a model to provide explanations for its predictions. Our model provides two types of explainability: a local and a global one. The local one explains the model’s prediction for each individual sample. To each feature of the sample, we assign importance which quantifies the contribution of this feature to the prediction. Positive importance of a feature indicates that its value for the considered sample increases the probability of a predicted class in classification tasks or increases the predicted numerical value in regression tasks. Negative importance implies the opposite. The global one represents aggregate feature importance of the whole dataset on which predictions are done. Unlike individual samples, the global feature importance is always nonnegative and tells how much in average the corresponding feature was used by the model for making predictions.
Input port – a block element that provides the ability to transfer data to this block from the previous one in the pipeline. Via the name of the input port you can refer to the dataset, e.g. inside a SQL transformation block.
Multiclass classification – type of AI task that predicts classes like high, medium, low.
Output port of a block provides the ability to transfer data from this block to the next one in the pipeline.
Pipeline is a series of processes represented by blocks connected in sequence, with the output of one block serving as the input for the next. A typical pipeline might include blocks such as uploading files, connecting to databases, transforming data, training an AI model, or generating predictions and analytics.
Playground – See Training Playground
Predefined Transformations is a block within a pipeline that contains a graphical interface with which you can define a specific data transformation. An example of such a block is “Join 2 sources”, which allows you to merge two data sources into one table according to the rules you specify.
Production – environment where user generate predictions for their data. More about Wand Hierarchy
Regression – is a task in which the model is given a set of labeled data and is asked to predict a continuous numerical value for a new, unseen example. For example, a regression model might be used to predict the price of a house based on its size, location, and other features.
Reports are CSV files containing results of successful runs. You can configure conditions and filters for email messages containing a link to download reports.
Runs are the action of running each environment again to create the relevant output.
Solution is an entity that represents a business problem you would like to solve using the Wand Platform. More about Wand Hierarchy
Staging – optional environment which users deploy a model for testing purposes. More about Wand Hierarchy
SQL Transformation is a process in which data is transformed using Structured Query Language (SQL). In the context of machine learning, SQL transformations might be used to prepare data for model training or evaluation. On the platform, it is represented by one of the blocks within a pipeline.
Training Playground – environment where users bring a training data in order to create the AI model. More about Wand Hierarchy