Binary Classification

The following will consider a problem where a classifier is tasked with dividing data/objects into two classes (binary classification).

Specifically, one could imagine a scenario where we consider a classifier that categorizes individuals into two categories based on blood test results:

Category	Meaning
Positive	The person has a particular medical condition
Negative	The person does not have the medical condition

Confusion Matrix

The confusion matrix describes the quality of a classifier by comparing the true class labels of the known training data with the predictions made by the classifier, presented in matrix form.

General structure of the confusion matrix for binary classification:

	Positive (actual)	Negative (actual)
Positive (classifier)	True positive [TP]	False positive [FP] (type I error)
Negative (classifier)	False negative [FN] (type II error)	True negative [TN]
Sum	Sum truly positive [SWP]	Sum truly negative [SWN]

In this context, TP (True Positives) refers to the training examples that were correctly classified by the classifier as positive (i.e., possessing a certain property or characteristic).
TN (True Negatives) refer to the instances that the classifier has correctly identified as negative. These examples do not possess the sought-after property, and the classifier appropriately classified them as such.

An ideal classifier would only have True Positives (TP) and True Negatives (TN). However, in practice, classifiers make errors, meaning training examples may be misclassified, such as being wrongly labeled as positive when they are actually negative. For example, a person might be classified as sick (positive) by the classifier, even though they are actually healthy (negative). These errors are reflected in the off-diagonal cells of the confusion matrix, and they describe the two types of errors a binary classifier can make:

FP: Instances where the classifier incorrectly predicts a positive class, even though the actual class is negative. (Type I error)
FN: Instances where the classifier incorrectly predicts a negative class, even though the actual class is positive. (Type II error)

Note: In the context of classification, the terms positive and negative in TP, FP, FN, and TN refer specifically to the classifier's predictions, not to the actual state of the data (i.e., the reality). Rate is always related to the sum of truly positive/negative elements.

Sensitivity, Specificity, Precision, Accuracy

To quickly assess the shortcomings of a binary classifier, several key metrics are commonly used.

Metric	Answers the question	Formula
Specificity/True Negative Rate (TNR)	What proportion of really negative examples do I recognize as negative?	$Specificity = \frac{TN}{SWN}$
Recall/Sensitivity/True Positive Rate (TPR)	What proportion of really positive examples do I recognize as positive?	$Sensitivity = \frac{TP}{SWP}$
Precision	Which examples recognized as positive are really positive? Dispersion of the values classified as positive.	$Precision = \frac{TP}{TP + FP}$
False Positive Rate (FPR)	What proportion of really negative examples do I not recognize as negative?	$FPR = \frac{FP}{SWN}$
False Negative Rate (FNR)	What proportion of really positive examples do I not recognize as positive?	$FNR = \frac{FN}{SWP}$
Accuracy	How large is the proportion of correctly recognized examples in the total quantity?	$Accuracy = \frac{TP + TN}{SWN + SWP}$

In binary classification, the values of Specificity, Sensitivity, Precision, and Accuracy range from 0% to 100%. The closer these values are to 100%, the better the classifier is at correctly identifying instances according to the specific measure.

We can look at these variables for the example above:

Category	Ill (actual)	Healthy (actual)
Ill (classifier)	95	7
Healthy (classifier)	11	80
Sum	106	87

The specificity here is 80/87, i.e. approx. 92%. The sensitivity is 95/106, which is about 90%. The precision is 95/(95+7), which is about 93% and the accuracy is (95+80)/(106+87), which is about 90%.

The example also shows that classifiers with a comparatively high sensitivity high sensitivity and specificity (over 90%) can generate a considerable number of misclassifications. misclassifications can be generated. Therefore, depending on the application the appropriate sensitivity and specificity depending on the application.

Classification Threshold

In many classification models, the classifier not only predicts a class but also provides a probability that indicates how confident it is about its classification. The classification probability helps determine how certain the model is about the predicted class. Depending on the classification method, it may be necessary to define a threshold value that defines the minimum probability from which an object is positive (e.g. from which classification probability a person can be considered ill).

This threshold can have an effect on sensitivity, specificity and accuracy:

Reducing the threshold in a classification model generally leads to more positive classifications, which impacts both True Positives (TP) and False Positives (FP). However, this usually also leads to an increase in the number of false positives (FP). This ultimately means that the sensitivity increases but the specificity decreases (does not necessarily happen to the same extent).
Increasing the threshold generally increases the number of negative classifications (both TN and FN). However, this often results in a decrease in the number of True Positives (TP), which means that sensitivity decreases. On the other hand, specificity increases.

Depending on the threshold value, sensitivity and specificity are in competition.

It is worth mentioning that precision can change with different thresholds. However, the extent of this change is usually smaller compared to sensitivity and specificity.

A good starting point for the threshold in binary classification is 50% (generally: 1/number of classes).

Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC)

Figure: ROC-AUC curve for a classifier at different thresholds.

The Receiver Operating Characteristic (ROC) is a curve that describes how the False Positive Rate (FPR) behaves relative to the True Positive Rate (TPR) when different classification thresholds are set. The area under this characteristic curve is referred to as the AUC (Area Under the Curve). Such an ROC-AUC curve is shown on the right.

The larger the area of the AUC, the better the classifier performs. A classifier with an AUC of 0 would be a perfect anti-classifier (which could actually be of practical use). A particularly poor or uninformative classifier would occur when the ROC curve coincides with the dashed line. For such a classifier, there would be a threshold where it is entirely random whether a positive element is classified as positive or a negative element is classified as negative.

Based on the ROC curve, several threshold points are good starting points: - The ROC point with the smallest distance to (FPR=0, TPR=1). This point is closest to the optimal classifier value of (FPR=0, TPR=1). - The ROC point that intersects the diagonal (the line from (FPR=0, TPR=1) to (FPR=1, TPR=0)). This corresponds to the Equal Error Rate (EER) threshold.

Equal Error Rate (EER)