When numerical precision can hurt you
The objective was to cure a very deadly disease and the drug was tested on mice. The results were impressive since 33% of the mice survived while only 33% died (the last mouse escaped and its outcome was unknown).
Numerical precision depends on the underlying number type. In .Net, there are 3 choices
double (64bits) and
decimal (128bits). Performance left aside, more precision cannot hurt, right?
My answer is It depends. If the only purpose of your number is to be processed by a machine, then fine, more precision never hurts. But what if a user is supposed to read that number? I did actually encounter this issue while working on a project of mine Re-Dox, reduced design of experiments (an online analytical software). In terms of usability, provide the maximal numerical precision to the user is definitively a very poor idea. Does adding twelve digits to the result of
10/3 = 3.333333333333 makes it more readeable? definitively not.
A very insteresting issue while design analytical software (i.e. software performing some kind of data analysis) is to choose the right number of digits. Smart rounding can be defined as an approach that seeks to provide all significant, but only significant, digits to the user. Although, the notion of “significant” digits is very dependant of the context and carries a lot of uncertainties. Therefore, for the software designer, smart rounding is more likely to be a tradeoff between usability and user requirements.
Providing general rules for smart rounding is hard. But here are the two heuristics that I am using. Both of them rely on user inputs to define the level of precision required. Key insight: since it’s usually not possible to know the accuracy requirements beforehand, the only reliable source of information is the actual user inputs.
Heuristic 1 = the number of digits in your outputs must not exceed the number of digits of user input by more than 1 or 2. Ex: If the user input
0.123 then provides a 4 or 5 digits rounding. Caution, do not take the user inputs “as such”, because they can include a lot of dummy digits (ex: the user can cut and past values that look like
10.0000, where the digits is zero and implicitely not significant). The underlying idea is “no algorithm ever creates any information, an algorithm only transform the information”.
Heuristic 2 = increase the number of digits of the heuristic 1 by a number equal to
CeillingOf(log10(N)/2) where N is the number of data inputs. Actually, this formula is simply an interpretation of the Central Limit Theorem (Wikipedia) for the purpose of smart-rounding. Why the need for such bizarre heuristic? The underlying idea is slightly more complicated here. Basically, no matter how you combine the data inputs, the rate of accuracy improvement is bounded. The bound provided here corresponds (somehow) to an “optimistic” approach where the accuracy increase at the maximal possible speed.