BENFORD'S LAW PROVIDES A DATA analysis method that can help alert CPAs to possible errors, potential fraud, manipulative biases, costly processing inefficiencies or other irregularities.
A PHYSICIST AT GE RESEARCH LABORATORIES in the 1920s, Frank Benford found that numbers with low first digits occurred more frequently in the world and calculated the expected frequencies of the digits in tabulated data. |
CPAs CAN USE BENFORD'S DISCOVERY in business applications ranging from accounts payable to Y2K problems. In addition, subset tests identify small lists of serious anomalies in large data sets, making an analysis more manageable.
DIGITAL ANALYSIS IS WELL SUITED to finding errors and irregularities in large data sets when auditors need computer assisted technologies to direct their attention to anomalies. |
MARK J. NIGRINI, CA (SA), PhD, MBA, is an assistant professor at the Edwin L. Cox School of Business, Southern Methodist University, Dallas, and a Research Fellow at the Ernst & Young Center for Auditing Research and Advanced Technology, University of Kansas, Lawrence. | |
When physicist Frank Benford tested the first digits in lists of numbers during the 1920s and 1930s, he found that about 31% of the numbers had 1 as the first digit, 19% had 2 , and only 5% had 9 .
(ILLUSTRATION COURTESY OF ARTOVERNITE.COM )
I s it possible to tell that a number is wrong just by looking at it? In some cases, you bet. Using Benford's law—a mathematical phenomenon that provides a unique method of data analysis—CPAs can spot irregularities indicating possible error, fraud, manipulative bias or processing inefficiency. Benford's law is used to determine the normal level of number duplication in data sets, which in turn makes it possible to identify abnormal digit and number occurrence. Accountants and auditors have begun to apply Benford's law to corporate data to discover number-pattern anomalies. For large data sets, CPAs use highly focused tests that concentrate on finding deviations in subsets.
EUREKA!
Frank Benford made a simple observation while working as a physicist at the GE Research Laboratories in Schenectady, New York, in the 1920s. He noticed that the first few pages of his logarithm tables books were more worn than the last few and from this he surmised that he was consulting the first pages—which gave the logs of numbers with low digits—more often. The first digit of a number is leftmost—for example, the first digit of 45,002 is 4 . (Zero cannot be a first digit.) Benford extrapolated that he was looking up the logs of numbers with low first digits more frequently because there were more numbers with low first digits in the world.
Exhibit1: Benford's Law—Expected Digital Frequencies
Benford then tested this idea by looking at the first digits of 20 lists of numbers with a total of 20,229 observations. His lists came from varied sources, such as geographic, scientific and demographic data. One list contained all the numbers in an issue of Reader's Digest . He found that about 31% of the numbers had 1 as the first digit, 19% had 2 , and only 5% had 9 as a first digit. Benford then made some physics-related assumptions about the distribution of naturally occurring data and, using integral calculus, he computed the expected frequencies of the digits and digit combinations.
The expected frequencies of the digits in the first four positions can be seen in exhibit 1, which shows a large bias in favor of low digits in the first position. The probability that the first digit is either a 1 , 2 or 3 is 60.2%.
Not all data sets follow Benford's law. Those data sets most likely to will have the following characteristics:
- The numbers describe the sizes of similar phenomena (for example, market values of corporations).
- The numbers do not contain a built-in maximum or minimum value (such as deductible IRA contributions or hourly wage rates).
Assigned numbers, such as Social Security numbers, zip codes or bank account numbers will not conform to Benford's law.
Mutual fund math. An intuitive explanation of Benford's law is to consider the total assets of a mutual fund that is growing at 10% per year. When the total assets are $100 million, the first digit of total assets is 1 . The first digit will continue to be 1 until total assets reach $200 million. This will require a 100% increase (from 100 to 200 ), which, at a growth rate of 10% per year, will take about 7.3 years (with compounding). At $500 million the first digit will be 5 .
Growing at 10% per year, the total assets will rise from $500 million to $600 million in about 1.9 years, significantly less time than assets took to grow from $100 million to $200 million. At $900 million, the first digit will be 9 until total assets reach $1 billion, or about 1.1 years at 10%. Once total assets are $1 billion the first digit will again be 1 , until total assets again grow by another 100%. The persistence of a 1 as a first digit will occur with any phenomenon that has a constant (or even an erratic) growth rate.
Benford's law has been found to apply to many sets of financial data, including income tax or stock exchange data, corporate disbursements and sales figures, demographics and scientific data. Since the 1940s, more than 150 academic papers on Benford's law have been published by mathematicians, statisticians, engineers, physicists and—recently—by accountants. None disputes it or offers a competing law related to digits. Perhaps Roger Pinkham wrote the most convincing support in 1961, when he showed that Benford's law was scale invariant. In other words, if a set of numbers followed Benford's law closely, and if all the numbers in the set were multiplied by a nonzero constant (such as 22.04 or 0.323), then the new set of numbers would also follow Benford's law closely. Only the probabilities of Benford's law had this property. This scale invariance helps us to understand why Benford's law works on financial data throughout the world, even though the data are expressed in different currencies. A recent review of the theory underlying Benford's law is by the mathematician Ted Hill in American Scientist (JulyAugust 1998).
Exhibit2: First Digits of Census Data
Exhibit 2, shows the results of an analysis of the population counts of the 3,141 U.S. counties, according to the 1990 census. Benford's law proportions are shown as the diamond studs on the line. The bars show the actual proportions. There are nine bars, one for each of the possible first digits. From the graph it's clear that the actual proportions follow Benford's law quite closely, which is what would be expected from authentic, unmanipulated data. The mean absolute deviation of the first digits of the census data is 0.7%, which means that on average, the actual proportion differed from the expected proportion by seven tenths of one percent. Auditors usually consider a difference of this magnitude to be immaterial. The underlying thesis of digital analysis is that Benford's law makes it possible to spot data anomalies.
TELLTALE THRESHOLD
In 1993, in State of Arizona v. Wayne James Nelson (CV92-18841), the accused was found guilty of trying to defraud the state of nearly $2 million. Nelson, a manager in the office of the Arizona State Treasurer, argued that he had diverted funds to a bogus vendor to demonstrate the absence of safeguards in a new computer system. The amounts of the 23 checks issued are shown in exhibit 3.
Exhibit3: Check Fraud in Arizona
Because human choices are not random, invented numbers are unlikely to follow Benford's law. Here are some divergent signs that Benford's law would have drawn attention to:
- As is often the case in fraud, the embezzler started small and then increased dollar amounts.
- Most of the amounts were just below $100,000. It's possible that higher dollar amounts received additional scrutiny or that checks above that amount required human signatures instead of automated check writing. By keeping the amounts just below an additional control threshold, the manager tried to conceal the fraud.
- The digit patterns of the check amounts are almost opposite to those of Benford's law. Over 90% have 7 , 8 or 9 as a first digit. Had each vendor been tested against Benford's law, this set of numbers also would have had a low conformity, signaling an irregularity.
- The numbers appear to have been chosen to give the appearance of randomness. Benford's law is quite counterintuitive; people do not naturally assume that some digits occur more frequently. None of the check amounts was duplicated; there were no round numbers; and all the amounts included cents. However, subconsciously, the manager repeated some digits and digit combinations. Among the first two digits of the invented amounts, 87 , 88 , 93 and 96 were all used twice. For the last two digits, 16 , 67 and 83 were duplicated. There was a tendency toward the higher digits; note that 7 through 9 were the most frequently used digits, in contrast to Benford's law. A total of 160 digits were used in the 23 numbers. The counts for the ten digits from 0 to 9 were 7, 19, 16, 14, 12, 5, 17, 22, 22, and 26, respectively. A CPA familiar with Benford's law could have easily spotted the fact that these numbers—invented to seem random by someone ignorant of Benford's law—fall outside expected patterns and thus merit closer examination.
Exhibit 4: First Two Digits of Accounts Payable Data
ON-THE-JOB APPLICATIONS
Corporate accounts payable data are a favorite target of the digital analysis technology. The first- and second-digit tests are used as high-level examinations of reasonableness (data authenticity). The graph of the first two digits of an accounts payable file of a NASDAQ-listed software company is shown in exhibit 4.
The line plots Benford's law and the bars show the actual proportions. When the bars extend above the Benford's law line, the actual proportion exceeds the Benford's lawpredicted proportion, creating an abnormal level of duplication for that first-two digit combination.
An analysis of the actual dollar amounts showed that the numbers $25, $30 and $10 occurred most frequently. The followup audit showed that invoices with these amounts were mainly for courier charges. Repeated low dollar amounts highlight inefficiencies if they are being processed for the same type of purchase. At one company, the followup audit showed that accounts payable was processing about 12,000 invoices annually for employee business card purchases from the same vendor. Monthly billing could make steep reductions in processing costs. Other problems that have been found include:
- Biases in corporate data. In one company's accounts payable data, there was a large first-two digit spike (excess of actual over expected) at 24. An analysis showed that the amount $24.50 occurred abnormally often. The audit revealed that these were claims for travel expenses and that the company had a $25 voucher requirement. Employees were apparently biased toward claiming $24.50.
- Ducking authorization levels. Sometimes managers concentrate their purchases just below their authorization levels so their choices won't be scrutinized. Managers with $3,000 purchasing levels might have a lot of invoices for $2,800 to $2,999, which would show up in data analysis by spikes at 28 and 29.
Benford's Law Formula |
P ( D _{1 }= d _{1 }) = log10 ( 1 + 1/d _{1 }) for d _{1 }{ 1, 2,...,9 } |
During one bank audit, the auditors analyzed the first two digits of credit card balances written off as uncollectible. The graph showed a large spike at 49. An analysis of the related dollar amounts (that is, from $480 to $499 and from $4,800 to $4,999) showed that the spike was caused mainly by amounts between $4,800 and $4,999, and that one officer was responsible for the bulk of these write-offs. The write-off limit for internal personnel was $5,000. It turned out that the officer was operating with a circle of friends who would apply for credit cards. After they ran up balances of just under $5,000, he would write the debts off.
It's also possible to test for excessive round numbers when an accountant wants to check for excessive estimating (perhaps royalty receivable schedules) and to test the last two digits to find number invention (perhaps in inventory counts).
Do It Yourself
For a hands-on introduction to Benford's law, open The Wall Street Journal and pick a random starting point in the stock tables for the two major exchanges. Tabulate the first digits of the daily volume (in hundreds) for 100 stocks. About 50 of the numbers on the list should start with a 1 or a 2. Only about 5 numbers should start with a 9—just as Benford's law would predict. |
REFINING THE TESTS
Corporate data sets are becoming larger and larger. The first-two-digits test could be fine-tuned to a first-three-digits test to keep sample sizes manageable, but there is still the potential for large samples. For example, the first two digits 30 might have an actual proportion of 0.02, which is higher than the 0.0142 expected, but an audit of 2% of the population would be excessive and expensive.
Subset tests identify small lists of serious anomalies in large data sets, making an analysis much more manageable. They focus on errors as opposed to biases, fraud or processing inefficiencies. Data subsets are natural groupings of the data. In accounts payable, the subsets are usually vendor numbers. In banking data, the subsets are usually account numbers. Other subset variables could be data for sales associates in retailing, transaction dates, travel agents in airline data, cost centers and employees in payroll data.
Relative size factor. The RSF test finds subsets where the largest number is out of line with the remaining numbers and is possibly an error. It has detected errors in accounts payable when staff miscoded the decimal point in the invoice amount. The relative size factor (RSF) for a subset is: RSF = Largest number in subset / Second largest number in subset. An amount of $452.47 was coded as $45,247. That erroneous $45,247 greatly exceeded all the other payments to that vendor and the error was detected due to the high RSF.
Put It to Work
Here are some possible practical applications for Benford's law and digital analysis.
Accounts payable data.
Estimations in the general ledger.
The relative size of inventory unit prices among locations.
Duplicate payments.
Computer system conversion (for example, old to new system; accounts receivable files).
Processing inefficiencies due to high quantity/low dollar transactions.
New combinations of selling prices.
Customer refunds. |
A company in the Midwest wired $600,000 to what it thought was a vendor but actually was a charity. The $600,000 was significantly in excess of the amount usually donated to the charity. Had the company run the RSF test using the recipient's checking account numbers as the subset variable, the test would have detected that an amount of this magnitude had never before been wired to that account number. The test is designed to detect data errors. For example, a high RSF in payroll data could signal an overtime error and a high RSF for inventories could signal a calculation or count error.
Same, same, different. This test also detects errors by identifying near-identical entries. In accounts payable data the test is often used to identify cases in which the invoice number is the same, the dollar amount is the same and the vendor numbers are different. These near-identical entries could occur if the wrong vendor is paid (perhaps the vendor number is miskeyed) and at a later stage the correct vendor is paid (because the system does not register payment of the invoice to that vendor). Companies that have used this test have reaped large paybacks of misdirected funds.
The same, same, different criteria can find many different types of near-identical entries. In airline ticket refund data, it can find cases where the ticket number is the same, the dollar amount is the same and the credit card number is different. In payroll data it can find instances where the employee number is the same, the date is the same and the checking account number is different. This test works very well in large data sets where the matches signal a serious error. The "hit" list is usually short enough to allow an audit of all the matches.
Read All about It
For more on Benford's law and digital analysis:
The First Digit Problem , by R. Raimi, American Mathematical Monthly 83 (Aug.Sept.): 521538, 1976.
The Use of Benford's Law as an Aid in Analytical Procedures , by M. J. Nigrini and L. I. Mittermaier, Auditing: A Journal of Practice and Theory 16 (Fall): 5267, 1997.
Using Digital Frequencies to Detect Fraud , by M. J. Nigrini, The White Paper (April/May): 36, 1996. |
Same, same, same. This test finds identical entries, such as duplicate payments in accounts payable. While many AP systems can make this identification, duplicates may still occur if some of the purchase details are miskeyed or when there are a number of payment centers or multiple payment systems. Duplicates are detected when all the payment data are analyzed together. This test can also be used in inventory, payroll, accounts receivable and sales.
ADDRESSING ANOMALIES
Using digital analysis on corporate data requires the use of a computer. There are digital analysis programs that operate in SAS, IDEA, ACL and Excel. Auditors can also write their own programs to calculate the digit and number frequencies.
Digital analysis requires knowledge of Benford's law and some professional judgment to identify anomalies worthy of investigation. It can be used in ongoing applications, such as accounts payable, and for one-time needs, such as Y2K problems. It is a surprising answer to the problem of data irregularities and a powerful tool for CPAs.