Well, it looks like April 15th is fast approaching, so to get you in the mood for filling out your IRS forms, I thought this month I'd share with you a little known secret of how the IRS can tell if you're, how should I say it delicately, cooking your books.

So, here is the question: Suppose I ask you to open your checkbook and write down on a sheet of paper the amount of every check you wrote last year. Then, ignore all the numbers after the first digit, and make a list of just the first digits in each amount. For example, if you wrote a check for \$537.46 you would just write down 5. Now count how often each first digit occurs in the list you just created. Your intuition would say that there are 9 possibilities, namely the numbers 1 through 9, and each of them is equally likely, so each one would occur roughly 1/9 or 11.1% of the time. Well, kimosabe, it is thinking like that that will put you face to face with an IRS auditor quicker than you can say "My wife balances the checkbook!"

It turns out that the probability that the first digit is a 1 or 2 is almost 50 50, (47.7% to be exact) while the probability that the first digit is a 9 is only 4.6%, much lower then the 11% you expected. How is this possible you ask. The real answer is a little complicated, and is known as Benford's law. You can look it up on the net, but I will try to give an intuitive, though not 100% rigorous explanation for what is going on here.

The first thing to notice is that it shouldn't make any difference to the distribution of the numbers in your checkbook if you went back and converted all the currencies to, say pesos, marks, francs, or any other national currency. In other words, the numbers are what is called in the trade, scale invariant. A clue to the strange behavior I stated above comes from our grade school friend, the multiplication table. I reproduced such a table below, to refresh our childhood memories:

 Our old friend, the multiplication table 1 2 3 4 5 6 7 8 9 2 4 6 8 10 12 14 16 18 3 6 9 12 15 18 21 24 27 4 8 12 16 20 24 28 32 36 5 10 15 20 25 30 35 40 45 6 12 18 24 30 36 42 48 54 7 14 21 28 35 42 49 56 63 8 16 24 32 40 48 56 64 72 9 18 27 36 45 54 63 72 81

Now if you count up the number of occurrences of each first digit in the table above, you will find that the frequency is as follows:

 First digit frequencies 1 2 3 4 5 6 7 8 9 18 15 11 12 6 7 4 5 3

What we notice is that the digits 1 and 2 occur much more frequently than, say the digits 6, 7, 8, or 9. Let's look at this another way, that perhaps will be even more convincing. I don't know about you, but I rarely write a check for more than \$10,000. Now of the checks you write that are between \$1000, and \$9999, how many are closer to \$1000 than \$9999. Similarly, of the checks you write between \$100 and \$999, how many are closer to \$100 than to \$999. You might think that this is just because we all a bunch of low rollers, but actually this will be true for anyone who can count, ie anyone. Why?

Imagine you have two people, one of them is counting, 1 2 3 ... and the other tells him to stop counting at random intervals. Whenever he stops, they write down the first digit of the number he just said, and then the count continues. Now let's look at how often the digit 1 comes up compared to the digit 9. If I ask you to stop counting within the first hundred numbers, then from 1 to 20, the digit 1 has been the leading digit 11 times, namely 1, 10, 11 .. 19. While the number 9 has been the leading digit only once. This continues all the way up to 89, and the digit 9 only catches up when you reach 99. Now keep counting. From 100 to 200 the digit 1 has come up 111 times, while the digit 9 has only been used 11 times until we reach 899. Then the digit 9's use increased until it too has been used 111 times when we reach 999. Do you see what is going on here? When I ask you to stop counting at random times, the window of opportunity for a 1 is much larger than the window of opportunity for a 9. If I ask you to stop anytime before the first 899 numbers, the chances of a 1 being the leading digit would be 111/899 while the chances of a 9 being a leading digit would be 11/899, or more than 10 times smaller. The probability of a 9 doesn't become 111/999 until we reach the number 999, so it is much more likely that you will stop while the leading digit is a 1, and not a 9.

What does this have to do with checkbooks? Well, ask yourself the question, where do the numbers in my checkbook come from? Most of mine come from bills, like phone, electricity, gas, restaurants, taxes, etc. Each of these bills has been arrived at by counting. The phone company counts the number of calls you make. The gas and electricity count how much energy you've been using. The little wheels in the meter spin around until someone at the electric company comes out and writes down the number he sees. Restaurants count how much food you've eaten. Unless you're in the habit of writing checks where you randomly select each digit, chances are the number you are writing a check for was arrived at by counting something, and thus the distribution of leading digits follows our observations in the previous paragraph, namely 1s are more likely than 9s.

So what does all this have to do with the IRS? Well, you can bet your sweet bippy that they know about this phenomenon, and are counting on the fact that you probably don't. If you start making up numbers on your IRS forms, you will probably pick each digit of the number at random, thinking that you are creating a "random" number. If you do this, the distribution of the leading digits will indeed be uniform, ie, a 1 is just as likely to appear as a 9, but this is not what should be happening if the numbers are coming from an accounting procedure. The computers at the IRS can spot this in a jiffy, and flag your return for more scrutiny.

I'll leave you with an opportunity to make a little money. Next time you want to pick up some quick cash, bet the guy at the bar sitting next to you that if the bartender picks any country in the world, you'll bet that the population of that country starts with a 1, 2, 3, or 4, and you give your sucker friend the digits 5 thru 9. To him it will seem like a good deal, since he has 5 digits to your 4, but in fact your probability of winning is 69.9%, much better odds than you'll ever get in Vegas.

Quote of the day:
The reason there are so few female politicians is that it is too much trouble to put makeup on two faces.
Maureen Murphy