Goodness Measure

What makes a 'good' forecast? The answer to this central question depends heavily on the underlying application case. In statistics, a wide variety of goodness measures are used to evaluate the quality of a forecast. This already suggests that there is no single, optimal goodness measure for evaluating forecasts. Rather, when assessing goodness measures, the knowledge about the data basis and the requirements for the forecast must be considered. Therefore, it is crucial to understand the principles on which these goodness measures are based.

Below, we will look at some of the most commonly used goodness measures:

Forecast Error
Absolute Error (AE)
Percentage Error (PE)
Absolute Percentage Error (APE)
Mean Error (ME)
Mean Absolute Error (MAE)
Mean Absolute Percentage Error (MAPE)
Symmetric Mean Absolute Percentage Error (sMAPE)
Mean Absolute Scaled Error (MASE)
Mean Squared Error (MSE)
Periods in Stock (PIS).

What these goodness measures have in common is that their evaluation is based on the so-called forecast error.

Forecast Error

The forecast error (engl. forecast error) is the difference between the actual value that occurred and the forecasted value.

Specifically, the forecast error for a forecast looking i time units ahead is given by:

$e_{t} = {act}_{t} - {fc}_{t}$

Where

{fc}_{t}

is the forecasted value after i time units, and

{act}_{t}

is the actual value that occurred at the corresponding time.

Properties

Sensitivity to Over- and Underestimation: The sign of the forecast error indicates whether the actual value was overestimated (e_i with a negative sign) or underestimated (e_i with a positive sign).
Non-scaled Measure: The forecast error must always be interpreted in relation to the magnitude of the data. A forecast error of 10 for data in the range of 10,000 suggests greater accuracy than if the data were in the range of 100.
Unbounded Range of Values: The forecast error has no upper or lower bound on the values it can take.
Measure at the Single Observation Level: The forecast error measures the quality of the forecast for a single temporal observation. This differs from aggregated goodness measures that summarize multiple observations.
Optimality Criterion: The closer the forecast error is to 0, the better the forecast.

Forecasts and forecast errors of a monthly time series during the testing period

Absolute Error (AE)

The absolute value of the deviation from the actual value to the predicted value is called the absolute forecast error (engl. absolute forecast error, AE). It is calculated as:

${AE}_{t} = | e_{t} | = | {act}_{t} - {fc}_{t} |$

Properties

Insensitive to Over- and Underestimation: The AE does not differentiate whether the forecast overestimated or underestimated the actual value.
Non-scaled Measure: Similar to the forecast error, this is a non-scaled goodness measure.
Non-negative: Due to the absolute value, the sign is always positive. The AE is unbounded upwards.
Measure at the Single Observation Level: Like the forecast error, the AE is a non-aggregated measure.
Optimality Criterion: The closer the AE is to 0, the better the forecast.

Percentage Error (PE)

The percentage forecast error (engl. percentage error, PE) measures the forecast error relative to the actual value that occurred. It is given by:

${PE}_{t} = \frac{e_{t}}{{act}_{t}} = \frac{{act}_{t} - {fc}_{t}}{{act}_{t}}$

Properties

Asymmetry in Over- and Underestimation: A forecast error with the same magnitude can lead to distorted percentage errors when the actual value changes. For example, if the forecast is fc_i = 100 and the actual value is act_i = 160, the forecast error is e_i = 60, which leads to a PE of +37.5%. However, if the actual value is act_i = 40, the forecast has now overestimated the actual value by 60 units, but the PE results in -150%, not -37.5% as might be intuitively expected. This distortion becomes more pronounced as the absolute error |+/-e_i| increases relative to act_i.
Scaled Measure: Unlike the forecast error, the PE accounts for the magnitude of the data values. It can be used to compare the quality of forecasts across data of different magnitudes.
Unbounded Range of Values: The PE can take any value.
Measure at the Single Observation Level: The PE is a non-aggregated goodness measure.
Not Suitable for Time Series with Zero Values: The PE is not suitable for time series that can take the value 0, as this would result in a division error.

Absolute Percentage Error (APE)

The absolute percentage forecast error (engl. absolute percentage error, APE) measures the magnitude of the forecast error relative to the actual value. It is given by:

${APE}_{t} = | \frac{e_{t}}{{act}_{t}} | = | \frac{{act}_{t} - {fc}_{t}}{{act}_{t}} |$

Apart from being restricted to non-negative values, the APE shares the properties of the percentage forecast error.

Properties

Asymmetry in Over- and Underestimation
Scaled Measure
Non-negative Range of Values
Measure at the Single Observation Level
Not Suitable for Time Series with Zero Values

Mean Error (ME)

The mean error (engl. mean error, ME) represents the average error. The mean is usually calculated over all forecast steps from 1 to h.

$ME = \frac{1}{h} \sum_{t = 1}^{h} e_{t} = \frac{1}{h} \sum_{t = 1}^{h} ({act}_{t} - {fc}_{t})$

With the ME, we have arrived at the aggregated quality measures.

Properties

Forecast errors in individual forecast steps can cancel each other out: If the forecast overestimates and underestimates in the same way, the individual errors will cancel each other out in the mean. A ME that deviates significantly from 0 indicates that the forecast has systematically overestimated the actual values and may suggest a structural problem in the forecasting model.
All forecast steps equally weighted: In most cases, the ME assigns equal weight to all forecast errors, regardless of whether the corresponding forecast values are in the near or distant future.
Non-scaled measure
Unbounded value range
Aggregated quality measure

Mean forecast error (ME) of a time series in the test period

Mean Absolute Error (MAE)

The mean absolute error (MAE) gives the average of all absolute forecast errors. The average is usually calculated over all forecast steps from 1 to n.

$MAE = \frac{1}{h} \sum_{t = 1}^{h} {AE}_{t} = \frac{1}{h} \sum_{t = 1}^{h} ∣ e_{t} ∣ = \frac{1}{h} \sum_{t = 1}^{h} ∣ {act}_{t} - {fc}_{t} ∣$

Apart from the fact that forecast errors cannot cancel each other out due to the restriction to non-negative values, the MAE shares the properties of the mean error.

Properties

Forecast errors in individual forecast steps cannot cancel each other out:
All forecast steps equally weighted:
Non-scaled measure
Non-negative value range
Aggregated quality measure

Mean absolute forecast error (MAE) of a time series in the test period

Mean Absolute Percentage Error (MAPE)

The mean absolute percentage error (MAPE) describes the average of the absolute percentage forecast errors relative to the magnitude of the actual values.

$MAPE = \frac{1}{h} \sum_{t = 1}^{h} {APE}_{t} = \frac{1}{h} \sum_{t = 1}^{h} ∣ \frac{e_{t}}{{act}_{t}} ∣ = \frac{1}{h} \sum_{t = 1}^{h} ∣ \frac{{act}_{t} - {fc}_{t}}{{act}_{t}} ∣$

Because division by zero would occur, the MAPE cannot be used for time series with many zero values. It also doesn't provide good results for time series with many values close to zero. However, unlike the MAE or MSE, the MAPE is unitless and can therefore be more useful for comparing the quality of forecasts of different magnitudes.

Here is the translation of your latest input into English, keeping the format as close as possible to the original:

Properties

The direction of individual forecast steps is not considered in the averaging
All forecast steps equally weighted
Scaled measure
Non-negative value range
Not suitable for time series with zero values
Aggregated quality measure

Mean Absolute Percentage Forecast Error (MAPE) of a time series in the test period

Mean Squared Error (MSE)

The mean squared error (MSE) corresponds to the average of the squared forecast errors.

$MSE = \frac{1}{h} \sum_{t = 1}^{h} {e_{t}}^{2} = \frac{1}{h} \sum_{t = 1}^{h} {({act}_{t} - {fc}_{t})}^{2}$

Like the MAE and MAPE, the MSE only takes into account the absolute deviation of the forecast from the actual value, not its direction. In comparison to the MAE, large errors carry more weight due to the squaring. As a result, the MSE is more sensitive to outliers. The mean squared error is often used as an optimization criterion in model building, such as in classical linear regression.

Properties

The direction of individual forecast steps is not considered
All forecast steps equally weighted
Non-scaled measure
Non-negative value range
Aggregated quality measure

Mean Squared Forecast Error (MSE) of a time series in the test period

Mean Absolute Scaled Error (MASE)

The mean absolute scaled error (MASE) corresponds to the MAE of the considered forecast divided by the MAE of a one-step naive forecast (in-sample) of the actual values from 1 to n.

$MASE = \frac{\frac{1}{h} \sum_{t = 1}^{h} ∣ {act}_{n+t} - {fc}_{n+t} ∣}{\frac{1}{n-1} \sum_{t = 2}^{n} ∣ {act}_{t} - {act}_{t-1} ∣}$

Therefore, a MASE greater than 1 implies that the considered forecast is worse than a one-step naive forecast; a MASE less than 1 implies that it is better. While a good one-step forecast should clearly have a MASE below 1, for a multi-step forecast, a MASE greater than 1 does not necessarily mean that the forecast is not good.

Like the MAPE, the MASE has no unit and is therefore suitable for comparisons. Compared to the MAPE, the MASE can handle (individual) zero values in time series better. However, the MASE is not well suited for nearly constant time series, as in this case the forecast errors of a naive forecast are often zero, making its MAE very small.

Properties

The direction of individual forecast steps is not considered in the averaging
All forecast steps equally weighted
Scaled measure
Non-negative value range
Comparison between models
Aggregated quality measure

Symmetric MAPE (sMAPE)

The symmetric mean absolute percentage error (sMAPE) averages the absolute errors divided by the mean of the absolute values of the actual and forecasted values.

$sMAPE = \frac{1}{h} \sum (\frac{| e_{t} |}{(| {act}_{t} | + | {fc}_{t} |) / 2}) = \frac{1}{h} \sum (\frac{| {act}_{t} - {fc}_{t} |}{(| {act}_{t} | + | {fc}_{t} |) / 2})$

Compared to the MAPE, where the weighting of the forecast errors is based only on the actual value, the sMAPE also takes into account the magnitude of the forecasted value. Like the MAPE, the sMAPE does not yield good results when many of the actual or forecasted values are near or equal to zero.

The sMAPE takes values between 0% and 200%.

Properties

The direction of individual forecast steps is not considered in the averaging
All forecast steps equally weighted
Scaled measure
Value range between 0 and 200%
Aggregated quality measure

MAPE vs. sMAPE

The transformed errors, which when averaged result in the MAPE or sMAPE, behave quite similarly; the difference between both metrics is most clearly shown in the following example. For simplicity, only one forecast time point is considered.

Case 1: The forecast is fixed at fc = 100. The actual values vary around 100 +/- 10. This leads to an absolute forecast error of 10 in both cases.

	act = 90, fc = 100	act = 110, fc = 100
MAPE	10/90	10/110
sMAPE	10/95	10/105

Case 2: Now, we fix the actual values at act = 100 and vary the forecast around 100 +/- 10. As in Case 1, the forecast error is 10 in both cases.

	act = 100, fc = 90	act = 100, fc = 110
MAPE	10/100	10/100
sMAPE	10/95	10/105

One can observe:

The sMAPE is symmetric with respect to fc <-> act. It remains the same when the actual and forecasted values are swapped.
The MAPE is symmetric with respect to fc = act + e <-> fc = act - e. For a fixed actual value act, the MAPE remains the same whether it is overestimated or underestimated by the forecast fc by an error e.

Periods in Stock (PIS)

Periods in Stock is a metric that sums up how long forecast errors stay as stock in a hypothetical warehouse before they are balanced out by corresponding forecast errors in the opposite direction.

$PIS = - \sum_{t = 1}^{h} \sum_{j = 1}^{t} e_{j} = - \sum_{t = 1}^{h} \sum_{j = 1}^{t} ({act}_{j} - {fc}_{j})$

The direction of the forecast errors is important for this metric. PIS, unlike other metrics such as MAE, takes into account the duration of mismatches between the forecast and the actual value and is therefore well-suited for evaluating forecasts of sporadic time series, i.e., time series with many zero values.

For example, a forecast several days too early (Forecast 1) results in a higher PIS, i.e., a worse forecast quality, compared to a forecast that is only one day shifted (Forecast 2), while other metrics like MAE would evaluate both cases equally.

Time Point	Actual Value	Forecast 1	"Stock" 1	Forecast 2	"Stock" 2
1	0	1	1	0	0
2	0	0	1	0	0
3	0	0	1	1	1
4	1	0	0	0	0

PIS₁ = 3, PIS₂ = 1, MAE₁ = MAE₂ = 0.5

Characteristics

Direction of individual forecast steps is considered
Duration of mismatch between forecast and actual value is relevant
Non-scaled metric
Unlimited range
Suitable for time series with zero values
Aggregated performance metric

Goodness Measure

Forecast Error

Properties

Absolute Error (AE)

Properties

Percentage Error (PE)

Properties

Absolute Percentage Error (APE)

Properties

Mean Error (ME)

Properties

Mean Absolute Error (MAE)

Properties

Mean Absolute Percentage Error (MAPE)

Properties

Mean Squared Error (MSE)

Properties

Mean Absolute Scaled Error (MASE)

Properties

Symmetric MAPE (sMAPE)

Properties

MAPE vs. sMAPE

Periods in Stock (PIS)

Characteristics

Cookies und andere (Dritt-)Dienste