Hierarchical aggregation level
Time Series Hierarchies
When working with time series, a suitable definition of the time series must first be found. In addition to the granularity, for example, this also includes the selection of a suitable hierarchical level at which the data is to be aggregated. The term hierarchical level is illustrated with the following example.
A company has data on the sale of its products. The item description, the quantity sold, the customers and the date of sale are are recorded:
Article description | Date of sale | Quantity | Customer |
---|---|---|---|
Article1 | 2020-01-01 | 10 | Customer1 |
Article1 | 2020-01-01 | 5 | Customer2 |
Article1 | 2020-01-01 | 6 | Customer3 |
Article1 | 2020-01-02 | 10 | Customer1 |
Article1 | 2020-01-02 | 7 | Customer4 |
Article1 | 2020-01-03 | 20 | Customer1 |
Article1 | 2020-01-04 | 10 | Customer1 |
Article1 | 2020-01-05 | 20 | Customer1 |
Article2 | 2020-01-01 | 2 | Customer5 |
Article2 | 2020-01-02 | 2 | Customer5 |
Article2 | 2020-01-03 | 2 | Customer5 |
Article2 | 2020-01-04 | 2 | Customer5 |
Article2 | 2020-01-05 | 2 | Customer5 |
... | ... | ... | ... |
Various hierarchical levels are available for the definition of article time series. The lowest hierarchical level corresponds to the definition of a time series as sales figures of an item per customer. This definition leads to the creation of time series that differ according to all available characteristics (item description, sales date, quantity, customer).
A higher hierarchical level is the time series of article sales figures where all customers are summed up. Following the example above following the example above:
Article description | Date of sale | Quantity |
---|---|---|
Article1 | 2020-01-01 | 21 |
Article1 | 2020-01-02 | 17 |
Article1 | 2020-01-03 | 20 |
Article1 | 2020-01-04 | 10 |
Article1 | 2020-01-05 | 20 |
Article2 | 2020-01-01 | 2 |
Article2 | 2020-01-02 | 2 |
Article2 | 2020-01-03 | 2 |
Article2 | 2020-01-04 | 2 |
Article2 | 2020-01-05 | 2 |
... | ... | ... |
The "customer" attribute has been lost in this step.
In order to form time series, you need at least one piece of time information and a corresponding value. Thus, the time series of the highest hierarchical level in our example corresponds to the total sales quantity.
Date of sale | Quantity |
---|---|
2020-01-01 | 23 |
2020-01-02 | 19 |
2020-01-03 | 22 |
2020-01-04 | 12 |
2020-01-05 | 22 |
Choice of hierarchical levels
The higher the hierarchical level to which the data is aggregated, the greater the loss of information. However, there are other criteria that need to be considered and weighed up when choosing a suitable hierarchical level.
Motivation
A key aspect is the overall motivation behind data analysis. The primary goal should be to aggregate the data at a hierarchical level that is relevant for subsequent analyses. If multiple levels are of interest, there are several approaches to consider. If working at the lowest target level, results for higher levels can be generated through aggregation. Conversely, if starting at a higher level, time series data must first be disaggregated to the underlying levels. Finally, analyses and forecasts can also be conducted in parallel across different levels.
Predictability
The choice of a hierarchical level has a significant impact on the characteristics of the associated time series. There is often a strong correlation between the selected hierarchical level and predictability, meaning the expected quality of the forecasts.
At lower levels, such as the article-customer level, individual customer purchase behavior can be highly sporadic, resulting in time series that are similarly noisy and erratic. However, when these time series are aggregated across all customers, the resulting article-level time series may exhibit strong seasonal patterns or trends. These characteristics can be lost when operating at the item-customer level.
Performance
Performance can also be a crucial factor when selecting the aggregation level. The runtime of processing steps is closely tied to the number of time series to be processed. The lower the chosen hierarchical level, the more time series can be generated, and the greater the subsequent processing effort may become.
Uniqueness
The relationship between time series at different hierarchical levels is not always unique. For instance, if aggregation levels such as item-customer and item-customer_location are chosen, there may not always be a clear relationship between the resulting time series (e.g., a customer might have branches in multiple countries). Ambiguous relationships necessitate additional rules for aggregation and disaggregation.
Hierarchical forecasts
Hierarchical forecasts refer to predictions that take into account the distinct characteristics and interdependencies of the individual hierarchical levels.
Disaggregation based on quantity shares can be understood as a hierarchical forecasting method. For example, one might forecast sales quantities at the item level and then disaggregate them to the item-customer level, determining each customer's share based on their sales history.
More complex hierarchical forecasting methods make predictions at all hierarchical levels. Additionally, these methods aim to minimize the forecast error at each level while providing consistent predictions across levels (aggregating from the lower level results in the upper level).