Do not make a sum with your forecasts - a coconuts study

Although we have tried to make Lokad as simple and intuitive as possible, statistical forecasting is a counter-intuitive science with many traps. In this post, I am going to describe one of the most frequent mistakes that I have encountered within many companies. In a nutshell, it is wrong to make a sum of forecasted values. Since the problem is quite hard to grasp, let’s start with an example.

Let’s say that you have 3 shops; and that those 3 shops are selling coconuts. Being in charge of the supply chain, let’s say that you need to forecast how much coconuts must be re-ordered next week. The coconuts will not be delivered to the shops directly but to a single warehouse. Thus there is only a single coconuts replenishment order for the 3 shops.

In order to perform your replenishment forecast, it is natural to rely on your historical coconuts sales data. In the present situation, you have 3 time-series representing the daily coconuts sales for each one of the 3 shops. How can we perform a single replenishment forecast based on those 3 time-series?

A naive method would consist of making one coconuts' sales forecast for each time-series (one forecast per shop), and then to make the sum of those 3 forecasted values in order to compute the replenishment order. Unfortunately, this method is wrong. A much more accurate approach consists of aggregating first the 3 time-series into one (i.e. summing the 3 time-series) and then performing a forecast directly on the aggregated time-series.

You are probably wondering what difference it makes between the two methods: forecasting first and then making the sum OR making the sum first then forecasting. Well, the true explanation requires some statistics that are totally beyond the scope of the post, thus I will try to give an intuitive explanation of the phenomenon. Summing forecasts does not improve the accuracy whereas making a forecast based one a single smoother time-series does improve the accuracy (the sum of 3 time-series is smoother than the initial time-series).