Last week I had my mind blown. As I am something of a nerd, I had my mind blown by rounding numbers – or more specifically, by the fact that not everyone does it like I do. I know – that’s odd on several levels; but if you stick with me, I think there’s an important and generalizable message.
It started with a tweet*. Mick Watson expressed surprise about the default way that R rounds numbers. And his surprise surprised me, because the way R rounds numbers is how I round numbers, and in fact I was surprised – no, shocked – that anyone would round numbers any other way.
So, mostly for fun but also partly to prove I was right, I did a quick Twitter poll (which is now the image above this post). It worked well for the “fun” part – over 500 votes in a day – but backfired spectacularly for the “prove I was right” part.
You see, I’ve always rounded the third way – with a value ending in 0.5 rounded to the nearest even integer. It turns out that less than 8% of poll respondents do it “my” way. Over 87% round up (the first way), while about 4% round down (the second way). This took me from “surprised – no, shocked” to “mind completely blown. It didn’t occur to me that anyone would round differently than me; but it turns out, the vast majority of people do.
By the way: you might wonder why one might round “my” way, or why I might care about rounding at all. Well, I did mention being a nerd! But there is a bit more to it, if you work with data much. (I’ll forgive you if you skip the technical content in this paragraph.) “My” way is variously called statistician’s rounding, Dutch rounding, convergent rounding, or bankers’ rounding (among other terms, as I discovered). You round a value ending in 0.5 to the nearest even integer because if you round 0.5’s up (first option) or down (second option) you introduce distortions to the data. It seems to me just utterly obvious that, if you have a set of numbers and round each of them, doing so should change individual numbers but shouldn’t change their sum (or average). Bankers’ rounding achieves this (1.5 + 2.5 + 3.5 + 4.5 = 12; 2 + 2 + 4 + 4 = 12), but rounding up or down does not (1.5 + 2.5 + 3.5 + 4.5 = 12; but 2 + 3 + 4 + 5 = 14; and you can do the arithmetic yourself for rounding down).
Why might it matter if rounding distorts sets of numbers? Well, in Superman III Richard Pryor discovers that his employer rounds salary cheques down, reprograms the payroll system to harvest the missing half cents, and buys a Ferrari. If you prefer your example from real life, in the 1980s the Vancouver Stock Exchange used a form of rounding-down for its main stock index and saw repeated rounding erode the index by almost 50% while the stock market was enjoying a bull run.** And you can probably imagine similar problems in any kind of data crunching that rounds repeatedly. And, since computers normally represent and work with numbers using “floating point” (real numbers of finite digit length, but not as integers), almost all data crunching repeatedly rounds. So rounding up (the choice of 88% of my poll respondents) is definitely bad, and bankers’ rounding is definitely better.
At this point you might imagine I was feeling rather smug in my arithmetic superiority. But three things disrupted my smugness:
First, while R and Python default to “my” kind of rounding (“correctly”, I shout into the void), Excel and SAS do not, and C++ –– well, C++ seems to revel in its own inscrutability, so I can’t even tell. I’ve used R, Excel, SAS, and C++ (among other software tools) and I wasn’t even aware they rounded differently from each other.
Second, I realized that I was so entirely clueless that I wasn’t even aware that most people rounded differently from me. I’ll come back to this point in a moment.
Third, reading the Wikipedia article about rounding (I think I mentioned that I’m a nerd, not that I really needed to) showed me that bankers’ rounding is not the clearly best kind of rounding, after all. While it doesn’t distort sums and averages, it does distort distributions (by piling up on even digits and underweighting odd ones). What’s better still is stochastic rounding, which would round 0.5 randomly to either its odd or even neighbour.*** I don’t round that way, but I probably should.
Now, I promised a generalizable point about all this, and since you’ve bravely and stubbornly stuck with me this far, here it is. It derives from the second of my three smugness-disrupting points. I was assuming that everyone rounded like me – and this assumption was especially pernicious because the rightness of bankers’ rounding was so obvious to me that I wasn’t even aware of it as an assumption. So how many other assumptions like that am I making – and, possibly, am I wrong about? It’s probably impossible to know. “Question everything” sounds like great advice, until you actually try it. Does everyone interpret P = 0.051 the same way I do? (OK, I know they don’t, that’s just a cheap way to link to this post.). Does everyone pronounce “salmon” the same way I do? Does everyone think a stop sign means the same thing I think it does? To quote King Lear (although he was definitely not talking about rounding), “O, that way madness lies; let me shun that”. So maybe don’t question everything, but question a lot.
You see, while my stop sign example is silly, these deeply buried assumptions can matter. And not just in data crunching. A lot of cultural misunderstandings arise from assumptions that one just doesn’t think to question. Writing issues can, too. I’ve been thinking a bit about mentoring developing writers who speak English as an additional language (for a chapter in this forthcoming book). Native speakers of a language tend to make all kinds of assumptions about how languages work, based on how their own language does (just like I make assumptions about how other people round, based on how I do it.) If your first language is English, you don’t need to think much about which nouns take a definite article (the artichokes, The Beatles) and which do not (breakfast, Supertramp). Actually, it’s not even that you don’t need to think much about this – I’d wager you never need to think about it at all. But other languages use articles quite differently, and writers who have English as an additional language find articles very difficult. If you’re mentoring such a writer, you can get the impression that they’re careless, or extremely unskilled – but it’s neither. You and they are just making different, but deeply instinctive, assumptions about how articles work. And they can’t make progress, and you can’t help them too, until you get those assumptions up into the fresh air.
OK, maybe “when to use the” isn’t something on which the future of the universe pivots. But you get the point. Or at least, I assume you do.
© Stephen Heard January 3, 2023
Thanks to Tristan Long for alerting me to Richard Pryor’s Ferrari. There isn’t much else to recommend Superman III, but that sequence is wonderful. Thanks also to the 560 people on Twitter who each blew 0.179% of my mind. Although I rounded that off.
*^Although Elon Musk is doing his very best to ruin Twitter, he hasn’t completely succeeded. Yet.
**^More specifically: the index was set to 1,000 on day 1, and sank to 520 before someone figured out that that didn’t make sense since the stocks it was based on were appreciating. Turns out the index was rounded (actually truncated) to 3 decimal places, several thousand times a day, and this added up – or rather, down – a lot.
***^An even better solution rounds not just 0.5’s stochastically, but all decimals, with probability based on their position between neighbouring integers. So, for example. 1.4 is rounded to 1 with probability 0.6, and to 2 with probability 0.4. This helps escape some really unfortunate arithmetical behaviour; for example, with other kinds of rounding – even bankers’ rounding – if you start with the number 1, repeating the operations “add 0.49, now round” as many times as you like will never actually move you away from 1. Those 0.49’s all vanish – or, alternatively, are transmogrified into Richard Pryor’s Ferrari.
Blew my mind as well – it never occurred to me that anyone used anything other than the first method! That’s what I was taught at school and I never questioned it. I suppose the follow on question is: how often in big data ecology does this matter?
By the way, at least two album covers by The Beatles refer to just “Beatles” with no definite article. Both of the ones I spotted – Sgt Pepper and Magical Mystery Tour – were at the height of their use of LSD, so perhaps they had other things on their minds….
LikeLike
Gosh, articles are even trickier than I thought!!
LikeLiked by 1 person
Same as Jeff upthread. I was taught the first way in school, and hadn’t ever heard of any other way of doing it.
I always thought of the first way as an arbitrary convention, not intrinsically better or worse than the second way. Rather like how the convention to drive on the right hand side of the road is not intrinsically better or worse than the convention to drive on the left.
As for whether banker’s rounding is better or worse than the first two ways, I guess the answer depends on why you’re rounding? Personally, I don’t think I ever have any reason to round numbers in raw data, or in any statistical analyses. Any rounding I do happens at the end, after the analyses are done, when I’m reporting the results. For instance, writing “P=0.041” in a paper rather than “P=0.0405”. I guess I don’t feel like it matters much exactly how I do that sort of rounding. In this I’m agreeing with Jason downthread.
Now I’m wondering–do bankers (real ones, not the ones in Superman III) do lots of rounding? And if so, do they use banker’s rounding?
LikeLiked by 1 person
Yes, real bankers do TONS of rounding – as do you, despite your protests. EVERY calculation you do using a computer is rounded because of the way computers represent numbers. Now, usually that rounding is many decimal places removed from the accuracy you care about (well, except for Richard Pryor’s half cents), but it’s there. When it gets done repeatedly, it matters how it’s done – see the Vancouver Stock Exchange example. So you should certainly hope that real bankers use bankers’ rounding (or some other mathematically desirable form)!
LikeLike
Fascinating (raises ear). I too am surprised by how many in your sample just ’round up’. I have sometimes done ‘conservative’ rounding where I round the number up if I am hoping for a small number and round down when I am hoping for a big number. Mostly, I try to avoid rounding altogether by allowing one extra digit. Kicking the digit down the road so to speak.
LikeLike
I think that “rounding up” is a misnomer: what I was taught was that if the digit was less than 5, you rounded down, and if it was 5 or greater, you rounded up.
LikeLike
But Jeff, that is in fact “rounding up” in a broader sense, because the systematic bias in rounding the 0.5’s pushes the overall mean up!
LikeLiked by 1 person
OK, I take your point, though I would never say to anyone that I had “rounded up” 5.3 to 5.0.
LikeLike
o I always thought the correct way to round x.5 was if x was even, you round up and if x was odd you round down. I didn’t see that in your poll options specifically, although there is that “something Selse” category. My rounds would have been 1,3,3,5
LikeLike
Ah, so that’s “round to odd” rather than “round to even”. Makes sense, and has the same advantages as “round to even”!
LikeLike
I always round up. It’s how I was taught and its simple. That said, I am also aware that it’s criticised for inflating numbers and there are other rules that are “better” for rounding. But, I don’t bother to use them (well I guess I do with R) because they are not important for me. I use rounding in two cases, mental maths and reducing the number of digits displayed. Mental maths is just an approximation and I’m not worried about being exact. And for the display, it’s a display. I would not round numbers and then use those to calculate something important. Why would I? R can just use the exact number. Rounding is only useful for humans.
LikeLike
You say ” R can just use the exact number. Rounding is only useful for humans.”. But of course R doesn’t use exact numbers – it rounds, all the time, as computers always do, using floating point. And when that rounding occurs over and over again, you can get the stock market problem!
LikeLike
That’s interesting. I confess, I don’t exactly know how R stores numbers internally but I assume there’s not too much I can do about that.
LikeLike
R and Python are clearly broken, in my view. C++ may be arcane, but it is possible to tell what it does in every case, although as the linked article points out, there are 5 cases.
Despite all my math (and programming) background, including several statistics classes, I’ve never heard of this “rounding to even/odd” or “bankers rounding”. I can see where it helps when doing means and sums for statistical purposes. But I don’t think I’d ever call that “rounding” in the general sense.
Instead, it’s a specialized method only to be used when inadequate precision is available to you when calculating a sum. And that just about never happens in 99.99% of the use cases, given most computers do 64-bit and even 128-bit math these days out of the box. How often do you need to worry about rounding that 20th or 39th decimal place AND the digit in that place is a 5? Not often, I’d bet.
LikeLike
Thanks so much for sharing this mind blowing post! I see your bankers’ rounding as a revelation. Those definite articles matter for translating, too. And the whole matter of assumptions gets very deep very quickly. Lovely.
LikeLiked by 1 person
Mind-blowing indeed! I tried it myself in R and then tried it with decimals.
’round(0.00015, digits = 4)’ gives 0.0001
’round(0.00025, digits = 4)’ gives 0.0003
’round(0.00035, digits = 4)’ gives 0.0004
’round(0.00045, digits = 4)’ gives 0.0004
I then checked the documentation for R’s round() function and found this:
LikeLike
Oh, that’s a neat complication! Shouldn’t have any major implications I think as the “representation error” isn’t biased in either direction. But neat!
LikeLike
> Shouldn’t have any major implications I think as the “representation error” isn’t biased in either direction.
On that topic, there’s another (not unpleasant) surprise lurking. Round the ten numbers 0.05, 0.15, 0.25, …, 0.95 to one decimal place using Banker’s rounding, without that pesky floating-point stuff getting in the way, and they’ll round alternately up and down, as expected. However, convert each one to the nearest IEEE 754 binary64 float first, and things become a mite less predictable, exactly because of this representation error: it turns out that 0.05, 0.45, 0.55, 0.65 and 0.75 round up, while 0.15, 0.25, 0.35, 0.85 and 0.95 round down. Spot the pattern? No, me neither.
Here’s the surprise: despite that unpredictability, it’s still true that _exactly_ half of the ten values round up, and the other half round down. And this is not simply a coincidence: if you round all 3-digit ties (0.005, 0.015, 0.025, …, 0.995) to two decimal places, the same exact split happens: fifty values round up and fifty round down (but again, the exact pattern of rounds up and down is unpredictable). And the same with all 4-digit ties rounded to 3 decimal places, and so on.
Here’s the 4-digit case in Python (not sure how well this will render here):
>>> values = [n/10000 for n in range(5, 10000, 10)]
>>> len(values)
1000
>>> len([x for x in values if round(x, 3) > x])
500
>>> len([x for x in values if round(x, 3) < x])
500
LikeLike
OK, my mind is blown AGAIN!
LikeLike
Pingback: ChatGPT did not write this post | Scientist Sees Squirrel
I got the correct answer to Steve’s rounding question because I remember the kerfuffle in Ontario about standard rounding and its implication for sales tax when the 1¢ coin was done away with. This is exciting: I am now anticipating the arrival in the mail of a certificate for my membership in the ScSeSq 8%.
Lawrie Daub
LikeLike