Machine Learning (ML)

Machine Learning is a subfield of artificial intelligence. Machine learning essentially refers to all methods that enable machines to learn based on data and, in this way, generate knowledge and possibly draw conclusions. More specifically, algorithms learn from data by optimizing and adjusting themselves with each new data point. Machine learning methods can generally handle many explanatory variables well. However, they often require more training data compared to statistical methods. A general distinction is made between Supervised and Unsupervised Learning.

Supervised and Unsupervised Learning

Supervised Learning

In Supervised Learning, labeled data is used. This means that the solutions that the system is supposed to generate are known in the training and test datasets. It can therefore be determined whether the algorithm's decision is good/correct or less good/incorrect in each case. This allows measuring how well the trained model performs. Naturally, the goal is to achieve as small an error as possible. Examples of Supervised Learning methods include decision tree methods (CART and Random Forests), regression methods (SVM and MARS), as well as Artificial Neural Networks (ANNs). These methods can be used for both regression and classification of data.

Unsupervised Learning

In Unsupervised Learning, there are no labels and thus no objective right or wrong. In addition to clustering, dimensionality reduction (compression) of data is the most common goal of Unsupervised Learning methods. Unsupervised Learning methods are primarily used during data preparation, rather than during the actual prediction phase.

Supervised Learning Methods

Classification And Regression Tree (CART)

Two-stage binary decision tree — CART learns a decision tree. Based on the given features of X and Z, a decision about the class membership or a forecast for the target variable is made.

CART is a term coined by the American statistician Leo Breiman for decision tree algorithms, where the class to which data belong is determined through binary trees.

In the case of time series forecasting, the forecast follows from this class.

CART algorithms are often used in machine learning and serve as the basis for Random Forests.

Random Forest

A Random Forest is a supervised learning method for the classification and regression of data, in which different, preferably diverse, decision trees are generated.

The values or classes resulting from the various decision trees (see also CART) are combined into a final result, which can provide more accurate outcomes than a single decision tree.

Learn more about Random Forest

Gradient Boosting Decision Tree Algorithm

The Gradient Boosting Decision Tree Algorithm is a supervised learning method that can be used for classification and regression of data. Like a Random Forest, an ensemble of decision trees is generated based on certain rules. The advantage of this method is that during the training process, additional decision trees are added, which are optimized to correct the misclassifications of the existing ensemble of decision trees.

Multivariate Adaptive Regression Splines (MARS)

Visualization of hinge functions in the forecasting method MARS — MARS combines multiple hinge functions to describe the target variable Y as accurately as possible through X. In the example above, it shows that the effect X has on Y depends on the magnitude of X.

Multivariate Adaptive Regression Splines is a method that autonomously develops a model based on data, taking into account non-linearities and interactions between various explanatory variables.

A model is progressively built from so-called hinge functions (and their products). Hinge functions are zero up to a certain threshold, then transition into a straight line with a positive or negative slope. They resemble a hockey stick. By cleverly chaining such hinge functions, complex relationships can be better approximated than with just linear terms.

MARS methods have an internal strategy to select the best combination of hinge functions and available explanatory variables.

Support Vector Machine (SVM)

Support Vector Machine is a supervised learning method for classifying data.

To separate data points from different classes as well as possible, SVM searches for class boundaries that have the largest possible margin from the data points of the different classes, leaving a wide area around the class boundary free of data points. SVM initially works with lines or planes for the class boundaries. Using the so-called kernel trick, i.e., a clever data transformation, complex, non-linear separation lines or planes can be found.

SVM can also be used for regression. This is referred to as Support Vector Regression.

Visualization of classification into two classes using SVM with a linear separation line — SVM separates the points of two classes (dark blue, pink) with a linear separation line.

Visualization of classification into two classes using SVM with a non-linear separation line — SVM separates the points with a non-linear separation line.

Artificial Neural Network (ANN)

Simple network architecture of an artificial neural network (KNN) with an input layer, a hidden layer, and an output layer — The simplest case of a network architecture: in addition to the input and output layers, there is a hidden layer. In the example above, it consists of 4 nodes. Since all 4 nodes are connected to all nodes of the previous (Input) and subsequent (Output) layers, this is called a fully connected layer.

An artificial neural network is based on a network of nodes (neurons) and their connections (synapses) inspired by the human brain. The nodes of a neural network are arranged in successive layers. The input data is passed through these layers. Each node weights the outputs of nodes in previous layers and passes its activation degree to nodes in subsequent layers based on certain activation rules. This way, complex relationships can be represented, and information from various input variables can be combined. A neural network initially learns suitable weights and activation rules in a training phase based on known data before it can be applied to new, unknown data.

When a complex network architecture with many deeply interconnected layers is used, it is referred to as Deep Learning or Deep Neural Networks.

Typical application areas for neural networks include image or speech recognition. Here, so-called Convolutional Neural Networks (CNNs) are often used. Neural networks can also be used effectively in time series contexts for pattern recognition or forecasting. Notably, Long Short-term Memory Networks (LSTMs) and Transformer-based approaches are worth mentioning here.