Goodness Measure
What makes a 'good' forecast? The answer to this central question depends heavily on the underlying application case. In statistics, a wide variety of goodness measures are used to evaluate the quality of a forecast. This already suggests that there is no single, optimal goodness measure for evaluating forecasts. Rather, when assessing goodness measures, the knowledge about the data basis and the requirements for the forecast must be considered. Therefore, it is crucial to understand the principles on which these goodness measures are based.
Below, we will look at some of the most commonly used goodness measures:
- Forecast Error
- Absolute Error (AE)
- Percentage Error (PE)
- Absolute Percentage Error (APE)
- Mean Error (ME)
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
- Symmetric Mean Absolute Percentage Error (sMAPE)
- Mean Absolute Scaled Error (MASE)
- Mean Squared Error (MSE)
- Periods in Stock (PIS).
What these goodness measures have in common is that their evaluation is based on the so-called forecast error.
Forecast Error
The forecast error (engl. forecast error) is the difference between the actual value that occurred and the forecasted value.
Specifically, the forecast error for a forecast looking i time units ahead is given by:
Where
is the forecasted value after i time units, and
is the actual value that occurred at the corresponding time.
Properties
- Sensitivity to Over- and Underestimation: The sign of the forecast error indicates whether the actual value was overestimated (ei with a negative sign) or underestimated (ei with a positive sign).
- Non-scaled Measure: The forecast error must always be interpreted in relation to the magnitude of the data. A forecast error of 10 for data in the range of 10,000 suggests greater accuracy than if the data were in the range of 100.
- Unbounded Range of Values: The forecast error has no upper or lower bound on the values it can take.
- Measure at the Single Observation Level: The forecast error measures the quality of the forecast for a single temporal observation. This differs from aggregated goodness measures that summarize multiple observations.
- Optimality Criterion: The closer the forecast error is to 0, the better the forecast.
Absolute Error (AE)
The absolute value of the deviation from the actual value to the predicted value is called the absolute forecast error (engl. absolute forecast error, AE). It is calculated as:
Properties
- Insensitive to Over- and Underestimation: The AE does not differentiate whether the forecast overestimated or underestimated the actual value.
- Non-scaled Measure: Similar to the forecast error, this is a non-scaled goodness measure.
- Non-negative: Due to the absolute value, the sign is always positive. The AE is unbounded upwards.
- Measure at the Single Observation Level: Like the forecast error, the AE is a non-aggregated measure.
- Optimality Criterion: The closer the AE is to 0, the better the forecast.
Percentage Error (PE)
The percentage forecast error (engl. percentage error, PE) measures the forecast error relative to the actual value that occurred. It is given by:
Properties
- Asymmetry in Over- and Underestimation: A forecast error with the same magnitude can lead to distorted percentage errors when the actual value changes. For example, if the forecast is fci = 100 and the actual value is acti = 160, the forecast error is ei = 60, which leads to a PE of +37.5%. However, if the actual value is acti = 40, the forecast has now overestimated the actual value by 60 units, but the PE results in -150%, not -37.5% as might be intuitively expected. This distortion becomes more pronounced as the absolute error |+/-ei| increases relative to acti.
- Scaled Measure: Unlike the forecast error, the PE accounts for the magnitude of the data values. It can be used to compare the quality of forecasts across data of different magnitudes.
- Unbounded Range of Values: The PE can take any value.
- Measure at the Single Observation Level: The PE is a non-aggregated goodness measure.
- Not Suitable for Time Series with Zero Values: The PE is not suitable for time series that can take the value 0, as this would result in a division error.
Absolute Percentage Error (APE)
The absolute percentage forecast error (engl. absolute percentage error, APE) measures the magnitude of the forecast error relative to the actual value. It is given by:
Apart from being restricted to non-negative values, the APE shares the properties of the percentage forecast error.
Properties
- Asymmetry in Over- and Underestimation
- Scaled Measure
- Non-negative Range of Values
- Measure at the Single Observation Level
- Not Suitable for Time Series with Zero Values
Mean Error (ME)
The mean error (engl. mean error, ME) represents the average error. The mean is usually calculated over all forecast steps from 1 to h.
With the ME, we have arrived at the aggregated quality measures.
Properties
- Forecast errors in individual forecast steps can cancel each other out: If the forecast overestimates and underestimates in the same way, the individual errors will cancel each other out in the mean. A ME that deviates significantly from 0 indicates that the forecast has systematically overestimated the actual values and may suggest a structural problem in the forecasting model.
- All forecast steps equally weighted: In most cases, the ME assigns equal weight to all forecast errors, regardless of whether the corresponding forecast values are in the near or distant future.
- Non-scaled measure
- Unbounded value range
- Aggregated quality measure
Mean Absolute Error (MAE)
The mean absolute error (MAE) gives the average of all absolute forecast errors. The average is usually calculated over all forecast steps from 1 to n.
Apart from the fact that forecast errors cannot cancel each other out due to the restriction to non-negative values, the MAE shares the properties of the mean error.
Properties
- Forecast errors in individual forecast steps cannot cancel each other out:
- All forecast steps equally weighted:
- Non-scaled measure
- Non-negative value range
- Aggregated quality measure
Mean Absolute Percentage Error (MAPE)
The mean absolute percentage error (MAPE) describes the average of the absolute percentage forecast errors relative to the magnitude of the actual values.
Because division by zero would occur, the MAPE cannot be used for time series with many zero values. It also doesn't provide good results for time series with many values close to zero. However, unlike the MAE or MSE, the MAPE is unitless and can therefore be more useful for comparing the quality of forecasts of different magnitudes.
Here is the translation of your latest input into English, keeping the format as close as possible to the original:Properties
- The direction of individual forecast steps is not considered in the averaging
- All forecast steps equally weighted
- Scaled measure
- Non-negative value range
- Not suitable for time series with zero values
- Aggregated quality measure
Mean Squared Error (MSE)
The mean squared error (MSE) corresponds to the average of the squared forecast errors.
Like the MAE and MAPE, the MSE only takes into account the absolute deviation of the forecast from the actual value, not its direction. In comparison to the MAE, large errors carry more weight due to the squaring. As a result, the MSE is more sensitive to outliers. The mean squared error is often used as an optimization criterion in model building, such as in classical linear regression.
Properties
- The direction of individual forecast steps is not considered
- All forecast steps equally weighted
- Non-scaled measure
- Non-negative value range
- Aggregated quality measure
Mean Absolute Scaled Error (MASE)
The mean absolute scaled error (MASE) corresponds to the MAE of the considered forecast divided by the MAE of a one-step naive forecast (in-sample) of the actual values from 1 to n.
Therefore, a MASE greater than 1 implies that the considered forecast is worse than a one-step naive forecast; a MASE less than 1 implies that it is better. While a good one-step forecast should clearly have a MASE below 1, for a multi-step forecast, a MASE greater than 1 does not necessarily mean that the forecast is not good.
Like the MAPE, the MASE has no unit and is therefore suitable for comparisons. Compared to the MAPE, the MASE can handle (individual) zero values in time series better. However, the MASE is not well suited for nearly constant time series, as in this case the forecast errors of a naive forecast are often zero, making its MAE very small.
Properties
- The direction of individual forecast steps is not considered in the averaging
- All forecast steps equally weighted
- Scaled measure
- Non-negative value range
- Comparison between models
- Aggregated quality measure
Symmetric MAPE (sMAPE)
The symmetric mean absolute percentage error (sMAPE) averages the absolute errors divided by the mean of the absolute values of the actual and forecasted values.
Compared to the MAPE, where the weighting of the forecast errors is based only on the actual value, the sMAPE also takes into account the magnitude of the forecasted value. Like the MAPE, the sMAPE does not yield good results when many of the actual or forecasted values are near or equal to zero.
The sMAPE takes values between 0% and 200%.
Properties
- The direction of individual forecast steps is not considered in the averaging
- All forecast steps equally weighted
- Scaled measure
- Value range between 0 and 200%
- Aggregated quality measure
MAPE vs. sMAPE
The transformed errors, which when averaged result in the MAPE or sMAPE, behave quite similarly; the difference between both metrics is most clearly shown in the following example. For simplicity, only one forecast time point is considered.
Case 1: The forecast is fixed at fc = 100. The actual values vary around 100 +/- 10. This leads to an absolute forecast error of 10 in both cases.
act = 90, fc = 100 | act = 110, fc = 100 | |
---|---|---|
MAPE | 10/90 | 10/110 |
sMAPE | 10/95 | 10/105 |
Case 2: Now, we fix the actual values at act = 100 and vary the forecast around 100 +/- 10. As in Case 1, the forecast error is 10 in both cases.
act = 100, fc = 90 | act = 100, fc = 110 | |
---|---|---|
MAPE | 10/100 | 10/100 |
sMAPE | 10/95 | 10/105 |
One can observe:
- The sMAPE is symmetric with respect to fc <-> act. It remains the same when the actual and forecasted values are swapped.
- The MAPE is symmetric with respect to fc = act + e <-> fc = act - e. For a fixed actual value act, the MAPE remains the same whether it is overestimated or underestimated by the forecast fc by an error e.
Periods in Stock (PIS)
Periods in Stock is a metric that sums up how long forecast errors stay as stock in a hypothetical warehouse before they are balanced out by corresponding forecast errors in the opposite direction.
The direction of the forecast errors is important for this metric. PIS, unlike other metrics such as MAE, takes into account the duration of mismatches between the forecast and the actual value and is therefore well-suited for evaluating forecasts of sporadic time series, i.e., time series with many zero values.
For example, a forecast several days too early (Forecast 1) results in a higher PIS, i.e., a worse forecast quality, compared to a forecast that is only one day shifted (Forecast 2), while other metrics like MAE would evaluate both cases equally.
Time Point | Actual Value | Forecast 1 | "Stock" 1 | Forecast 2 | "Stock" 2 |
---|---|---|---|---|---|
1 | 0 | 1 | 1 | 0 | 0 |
2 | 0 | 0 | 1 | 0 | 0 |
3 | 0 | 0 | 1 | 1 | 1 |
4 | 1 | 0 | 0 | 0 | 0 |
PIS1 = 3, PIS2 = 1, MAE1 = MAE2 = 0.5
Characteristics
- Direction of individual forecast steps is considered
- Duration of mismatch between forecast and actual value is relevant
- Non-scaled metric
- Unlimited range
- Suitable for time series with zero values
- Aggregated performance metric