Binary Classification

The following will consider a problem where a classifier is tasked with dividing data/objects into two classes (binary classification).

Specifically, one could imagine a scenario where we consider a classifier that categorizes individuals into two categories based on blood test results:

Category Meaning
Positive The person has a particular medical condition
Negative The person does not have the medical condition

Confusion Matrix

The confusion matrix describes the quality of a classifier by comparing the true class labels of the known training data with the predictions made by the classifier, presented in matrix form.

General structure of the confusion matrix for binary classification:

Positive (actual) Negative (actual)
Positive (classifier) True positive [TP] False positive [FP] (type I error)
Negative (classifier) False negative [FN] (type II error) True negative [TN]
Sum Sum truly positive [SWP] Sum truly negative [SWN]

An ideal classifier would only have True Positives (TP) and True Negatives (TN). However, in practice, classifiers make errors, meaning training examples may be misclassified, such as being wrongly labeled as positive when they are actually negative. For example, a person might be classified as sick (positive) by the classifier, even though they are actually healthy (negative). These errors are reflected in the off-diagonal cells of the confusion matrix, and they describe the two types of errors a binary classifier can make:

Note: In the context of classification, the terms positive and negative in TP, FP, FN, and TN refer specifically to the classifier's predictions, not to the actual state of the data (i.e., the reality). Rate is always related to the sum of truly positive/negative elements.

Sensitivity, Specificity, Precision, Accuracy

To quickly assess the shortcomings of a binary classifier, several key metrics are commonly used.

Metric Answers the question Formula
Specificity/True Negative Rate (TNR) What proportion of really negative examples do I recognize as negative? Specificity = TN SWN
Recall/Sensitivity/True Positive Rate (TPR) What proportion of really positive examples do I recognize as positive? Sensitivity = TP SWP
Precision Which examples recognized as positive are really positive? Dispersion of the values classified as positive. Precision = TP TP + FP
False Positive Rate (FPR) What proportion of really negative examples do I not recognize as negative? FPR = FP SWN
False Negative Rate (FNR) What proportion of really positive examples do I not recognize as positive? FNR = FN SWP
Accuracy How large is the proportion of correctly recognized examples in the total quantity? Accuracy = TP + TN SWN + SWP

In binary classification, the values of Specificity, Sensitivity, Precision, and Accuracy range from 0% to 100%. The closer these values are to 100%, the better the classifier is at correctly identifying instances according to the specific measure.

We can look at these variables for the example above:

Category Ill (actual) Healthy (actual)
Ill (classifier) 95 7
Healthy (classifier) 11 80
Sum 106 87

The specificity here is 80/87, i.e. approx. 92%. The sensitivity is 95/106, which is about 90%. The precision is 95/(95+7), which is about 93% and the accuracy is (95+80)/(106+87), which is about 90%.

The example also shows that classifiers with a comparatively high sensitivity high sensitivity and specificity (over 90%) can generate a considerable number of misclassifications. misclassifications can be generated. Therefore, depending on the application the appropriate sensitivity and specificity depending on the application.

Classification Threshold

In many classification models, the classifier not only predicts a class but also provides a probability that indicates how confident it is about its classification. The classification probability helps determine how certain the model is about the predicted class. Depending on the classification method, it may be necessary to define a threshold value that defines the minimum probability from which an object is positive (e.g. from which classification probability a person can be considered ill).

This threshold can have an effect on sensitivity, specificity and accuracy:

Depending on the threshold value, sensitivity and specificity are in competition.

It is worth mentioning that precision can change with different thresholds. However, the extent of this change is usually smaller compared to sensitivity and specificity.

A good starting point for the threshold in binary classification is 50% (generally: 1/number of classes).

Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC)

ROC-AUC curve for a classifier at different thresholds.
Figure: ROC-AUC curve for a classifier at different thresholds.

The Receiver Operating Characteristic (ROC) is a curve that describes how the False Positive Rate (FPR) behaves relative to the True Positive Rate (TPR) when different classification thresholds are set. The area under this characteristic curve is referred to as the AUC (Area Under the Curve). Such an ROC-AUC curve is shown on the right.

The larger the area of the AUC, the better the classifier performs. A classifier with an AUC of 0 would be a perfect anti-classifier (which could actually be of practical use). A particularly poor or uninformative classifier would occur when the ROC curve coincides with the dashed line. For such a classifier, there would be a threshold where it is entirely random whether a positive element is classified as positive or a negative element is classified as negative.

Based on the ROC curve, several threshold points are good starting points: - The ROC point with the smallest distance to (FPR=0, TPR=1). This point is closest to the optimal classifier value of (FPR=0, TPR=1). - The ROC point that intersects the diagonal (the line from (FPR=0, TPR=1) to (FPR=1, TPR=0)). This corresponds to the Equal Error Rate (EER) threshold.

Equal Error Rate (EER)

A classifier operates most balanced when sensitivity and specificity are equal (high). In other words, a good balance point is where the False Positive Rate (FPR) and the False Negative Rate (FNR) are equal (low).

