Machine Learning (ML)

Machine Learning is a subfield of artificial intelligence. Machine learning essentially refers to all methods that enable machines to learn based on data and, in this way, generate knowledge and possibly draw conclusions. More specifically, algorithms learn from data by optimizing and adjusting themselves with each new data point. Machine learning methods can generally handle many explanatory variables well. However, they often require more training data compared to statistical methods. A general distinction is made between Supervised and Unsupervised Learning.

Supervised and Unsupervised Learning

Supervised Learning

In Supervised Learning, labeled data is used. This means that the solutions that the system is supposed to generate are known in the training and test datasets. It can therefore be determined whether the algorithm's decision is good/correct or less good/incorrect in each case. This allows measuring how well the trained model performs. Naturally, the goal is to achieve as small an error as possible. Examples of Supervised Learning methods include decision tree methods (CART and Random Forests), regression methods (SVM and MARS), as well as Artificial Neural Networks (ANNs). These methods can be used for both regression and classification of data.

Unsupervised Learning

In Unsupervised Learning, there are no labels and thus no objective right or wrong. In addition to clustering, dimensionality reduction (compression) of data is the most common goal of Unsupervised Learning methods. Unsupervised Learning methods are primarily used during data preparation, rather than during the actual prediction phase.

Supervised Learning Methods

Classification And Regression Tree (CART)

Two-stage binary decision tree — CART learns a decision tree. Based on the given features of X and Z, a decision about the class membership or a forecast for the target variable is made.

CART is a term coined by the American statistician Leo Breiman for decision tree algorithms, where the class to which data belong is determined through binary trees.

In the case of time series forecasting, the forecast follows from this class.

CART algorithms are often used in machine learning and serve as the basis for Random Forests.

Random Forest

A Random Forest is a supervised learning method for the classification and regression of data, in which different, preferably diverse, decision trees are generated.

The values or classes resulting from the various decision trees (see also CART) are combined into a final result, which can provide more accurate outcomes than a single decision tree.

Learn more about Random Forest

Gradient Boosting Decision Tree Algorithm

The Gradient Boosting Decision Tree Algorithm is a supervised learning method that can be used for classification and regression of data. Like a Random Forest, an ensemble of decision trees is generated based on certain rules. The advantage of this method is that during the training process, additional decision trees are added, which are optimized to correct the misclassifications of the existing ensemble of decision trees.

Multivariate Adaptive Regression Splines (MARS)

Visualization of hinge functions in the forecasting method MARS — MARS combines multiple hinge functions to describe the target variable Y as accurately as possible through X. In the example above, it shows that the effect X has on Y depends on the magnitude of X.

Multivariate Adaptive Regression Splines is a method that autonomously develops a model based on data, taking into account non-linearities and interactions between various explanatory variables.

A model is progressively built from so-called hinge functions (and their products). Hinge functions are zero up to a certain threshold, then transition into a straight line with a positive or negative slope. They resemble a hockey stick. By cleverly chaining such hinge functions, complex relationships can be better approximated than with just linear terms.

MARS methods have an internal strategy to select the best combination of hinge functions and available explanatory variables.

Support Vector Machine (SVM)

Support Vector Machine is a supervised learning method for classifying data.

To separate data points from different classes as well as possible, SVM searches for class boundaries that have the largest possible margin from the data points of the different classes, leaving a wide area around the class boundary free of data points. SVM initially works with lines or planes for the class boundaries. Using the so-called kernel trick, i.e., a clever data transformation, complex, non-linear separation lines or planes can be found.

SVM can also be used for regression. This is referred to as Support Vector Regression.

Visualization of classification into two classes using SVM with a linear separation line — SVM separates the points of two classes (dark blue, pink) with a linear separation line.

Visualization of classification into two classes using SVM with a non-linear separation line — SVM separates the points with a non-linear separation line.

Artificial Neural Network (ANN)

Simple network architecture of an artificial neural network (KNN) with an input layer, a hidden layer, and an output layer — The simplest case of a network architecture: in addition to the input and output layers, there is a hidden layer. In the example above, it consists of 4 nodes. Since all 4 nodes are connected to all nodes of the previous (Input) and subsequent (Output) layers, this is called a fully connected layer.

An artificial neural network is based on a network of nodes (neurons) and their connections (synapses) inspired by the human brain. The nodes of a neural network are arranged in sequential layers. The input data is passed through each of these layers. Each node weights the outputs of nodes in preceding layers and passes its activation level—according to specific activation rules—on to nodes in subsequent layers. This allows complex relationships to be mapped and information from various input variables to be linked. Using known data, a neural network first learns appropriate weights and activation rules during a training phase before it can be applied to new, unknown data.

Deep Learning (DL)

When there is a complex network architecture with many layers deeply interconnected, it is referred to as Deep Learning or Deep Neural Networks. Deep Learning extends the concept of artificial neural networks by using networks with a very large number of layers (“deep” networks). These deep networks enable the learning of hierarchical representations of data, with each layer capturing features on a more abstract level. As a result, deep learning models can recognize extremely complex patterns in data and have enabled revolutionary advances in areas such as image recognition and language processing.

Multi-Layer Perceptron (MLP)

Multi-Layer Perceptrons (MLPs) are a fundamental architecture in neural networks. They consist of multiple fully connected layers and use activation functions such as ReLU or sigmoid. MLPs are particularly well-suited for structured data and are frequently used for classification and regression problems. Since they do not have a specialized architecture for spatial or sequential data, they are usually combined with other networks or replaced by specialized architectures.

Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs) are designed to process sequential data by storing previous information and using it for subsequent calculations. This allows RNNs to capture temporal dependencies, which is why they are often used in language processing, time series analysis, and music generation. An evolution of this architecture is provided by Long Short-Term Memory Networks (LSTMs) and Gated Recurrent Units (GRUs), which solve the vanishing gradient problem and thereby model long-term dependencies more effectively. However, due to their sequential nature, RNNs are not very efficient as parallel computations are hardly possible, leading to longer training times.

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) have been specifically developed for processing image data. They use convolution operations to extract relevant features from images. Through the concept of filter matrices and pooling layers, CNNs are particularly efficient at recognizing patterns regardless of their position in the image. 1D-CNNs use one-dimensional convolution filters to capture local patterns in the time or sequence domain. Due to their high efficiency and parallel computability, CNNs have established themselves in many areas of applications.

Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs) are a class of neural networks specifically designed to process graph-structured data. Unlike traditional neural networks that operate on grids (CNNs) or sequences (RNNs), GNNs leverage the relationships between entities represented as nodes and edges in a graph. Through iterative message passing, nodes aggregate information from their neighbors, allowing the network to learn complex patterns in relational data.

Transformer

An important advancement in the field of deep learning architectures is Transformer Networks. In contrast to traditional neural networks that process data sequentially, transformers use what are known as self-attention mechanisms (Vaswani et al., 2017). Self-attention allows the network to simultaneously weight the relevance of different parts of the input data and capture relationships over longer distances in the input. By chaining together multiple attention- and MLP-layers, transformers have set new standards in the field of natural language processing (NLP) and form the basis of many modern language models, while also boosting applications in other domains.

Foundation Models (FMs)

Building on the advances in deep learning and especially the Transformer architecture, Foundation Models have been established. Foundation models are extremely large models that are pre-trained on enormous amounts of data. These models, such as GPT (Generative Pre-trained Transformer), learn very general representations of data. Through this extensive pre-training, foundation models can then be very efficiently fine-tuned for a wide variety of specific tasks. Moreover, they possess the ability to make zero-shot predictions. This means they are capable of solving tasks for which they have not seen explicit examples during training. Their deep understanding of patterns and relationships in the training data allows them to generalize to new contexts and deliver meaningful results without additional adjustments. This leads to enormous time savings, as the costly training process for each individual task is eliminated. Additionally, some foundation models are evolving to support multimodality, enabling them to integrate and process different types of data simultaneously such as text, images, and audio.

Large Language Models (LLMs)

Large Language Models are a special class of foundation models that focus on processing natural language. They are based on the transformer architecture and are pre-trained on enormous amounts of text, which allows them to develop a deep understanding of language, grammar, semantics, and even logical reasoning. Well-known examples of LLMs include GPT (Generative Pre-trained Transformer), BERT (Bidirectional Encoder Representations from Transformers), and T5 (Text-to-Text Transfer Transformer).