Auditors can use Benford's Law to identify general ledger irregularities — both those that can indicate fraud and those that do not. In this article, I introduce audit data analytics techniques that practitioners can use to identify deviations from Benford's Law.
As a reminder, Benford's Law gives us the expected proportions of the first (as well as the first-two, the first-three, and so on) digits in tabulated data. Accounting data, including journal entry amounts, have been found to conform reasonably closely to these expected proportions. Deviations from Benford's Law could indicate that the general ledger includes large counts of fictitious journal entries that are below the auditor's testing threshold. They could also indicate that the general ledger includes irregularities in the form of unusually high duplications of same-dollar transactions. These irregularities are not necessarily fraudulent but might be due to factors such as processing inefficiencies or high duplications of amounts that are per diem travel allowances.
Below, I discuss data analytics techniques that can be used to identify such irregularities, including using the Runs test to detect ridges and valleys in your results, and also identifying actionable spikes in your results. (View an instructional video on Benford's Law-based journal entry testing and download an Excel spreadsheet that can be used to prepare the first-two digits graph, and to calculate the number of runs, spikes, and the mean absolute deviation.)
Benford's Law-based journal entry testing
Benford's Law-based testing could direct auditors to large counts of journal entries that are below their testing threshold, which could be an indicator of a certain type of fraud mechanism. In the HealthSouth fraud, for instance, the accounting personnel abused the testing threshold of $5,000 by creating thousands of fraudulent journal entries that were just below that threshold. Below, I demonstrate how Benford's Law-based tests could have worked well (admittedly with the benefit and wisdom of many years of hindsight) to detect their fraud techniques.
Weston Smith was the controller and the CFO at HealthSouth from 2000 to 2002. From 1997 through 2002 the company overstated its assets, and its net income amounts, by several billion dollars. In his 2013 article, "Lessons of the HealthSouth Fraud: An Insider's View," Smith describes the methods that he and his co-conspirators used to carry out the fraud. Their end-of-days method, which he called "Rabbits pulled out of hats," involved calculating the pro forma earnings at the end of each quarter, identifying "the hole" (the shortfall between the actual and the desired earnings), and then filling the hole with "dirt" (fictitious journal entries to eliminate the shortfall). The shortfall was then "exploded" out into a half-million journal entries per year, all below the auditors' $5,000 testing threshold.
I ran a simulation to show that Benford's Law-based testing could have detected a fraud scheme executed using below-the-testing-threshold dollar amounts, and I show the results below. In a real-world audit, the audit population would be all the journal entries posted from the date of the trial balance at the end of the prior year to the date of the trial balance in the current year. The testing objective would be to identify journal entries made in amounts below the testing threshold ($5,000 in HealthSouth's case) that materially changed the ledger account balances and created a fraudulent set of financial statements.
I used data from the annual journal entries of 29 anonymous organizations. I analyzed the data for the organizations as if it were a single organization with 29 divisions. There were a total of 493,621 journal entry amounts. The first-two digits of the journal entries are shown in the figure "The First-Two Digits of the Population of Journal Entries," below.
The x-axis of this figure shows the first-two digits, which range from 10 to 99. Negative signs, which might indicate a credit entry, and leading zeroes (as in $0.05) are always ignored. For example, the first-two digits of 1,805, 0.018, and -1.80 are all 18. The y-axis shows the proportions, with the bars showing the actual proportions and the line showing the Benford's Law proportions. The count of the amounts with first-two digits 50 was 9,807 items, which gave a proportion of 0.02 (9,807 ÷ 493,621 records). The Benford's Law proportions range from a high of 0.0414 for the 10 on the left side of the graph, down to a low of 0.0044 for the 99 on the right side of the graph. As the first-two digits increase (10, 11, 12, ..., 99), the expected proportions decrease. Where the top of the bar touches the line, it means that the Benford's Law proportion and the actual proportion are almost equal. The conformity of the data to Benford's Law can be measured by calculating the average of all the deviations (the spikes above the Benford's Law line and the gaps below the Benford's Law line). By this metric (called the mean absolute deviation) the conformity level is a little better (the deviations or differences are slightly smaller) than what I've usually seen in populations of corporate journal entries.
I chose to analyze the first-two digits, even though Benford's Law is usually associated with the first digits of the amounts in tabulated data. I did this because of the bluntness issue associated with the first digits and for reasons related to audit efficiency. First, individuals can fictitiously increase the dollar amount of a valid journal entry (impacting accuracy) by a reasonably large percentage that leaves the first digit unchanged. For example, $1,510, $2,204, and $3,100 can all be increased by 28.5%, and the new inflated amounts will still have first digits of 1, 2, and 3, respectively. Second, a first-digit test will give audit samples that are large and inefficient. In the figure "The First Two-Digits of the Population of Journal Entries," the first digit 5 amounts (with first-two digits from 50 through to 59) are significantly overstated, and testing all the first digit 5 amounts would mean testing 8.9% of the population. The first digit 5 over is almost entirely explained by the spike (the excess) at 50, and testing the first-two digit 50 amounts would mean testing only 2% of the population, a marked reduction in audit work.
As the next step in simulating the HealthSouth fraud method, I added fictitious journal entries, all with dollar amounts from $2,000 to $4,999, to the authentic data shown in the figure "The First-Two Digits of the Population of Journal Entries." I chose these amounts because the fictitious entries in the HealthSouth fraud were all below $5,000, as the CFO knew that the auditors only tested journal entries of $5,000 and higher. I assumed that the fictitious entries were evenly distributed from $2,000 to $4,999 because anything smaller than $2,000 would only have had a small net income impact. The fictitious entries would have been "extra" journal entries, like the cream floating on top of Irish coffee, with their first-two digits evenly distributed from 20 to 49. Part 1 of the figure "Authentic Journal Entries With Added Fictitious Amounts," below, shows the new first-two digits graph after 49,000 amounts (10% of the total) ranging from $2,000 to $4,999 were added to the authentic journal entries. Part 2 of that figure, also below, shows the results after adding still another 49,000 such amounts to the authentic journal entries.
The fictitious journal entries inflated the actual proportions from 20 to 49 ($2,000 to $4,999). In Part 1 of the figure "Authentic Journal Entries With Added Fictitious Amounts," a ridge has formed from 20 to 50, a ridge being a range in the graph where the actual proportions are (mostly) above the Benford's Law line. The fictitious entries inflated the 20 to 49 proportions which together with the preexisting spike at 50 created a ridge from 20 to 50. The addition of the first-two digit 20 to 49 amounts also causes the other actual proportions to decrease, creating corresponding valleys from 10 to 19 and from 51 to 99. A valley is a range in the graph where the actual proportions are (mostly) below the Benford's Law line.
This clustering of the unders and the overs (actual less than Benford's Law, or actual more than Benford's Law) might not be noticeable to an untrained eye. To identify this valley-ridge-valley pattern, auditors could use the Runs test. A run is defined as a series of successive (in a row) overs or a series of successive unders. A run could consist of a single over if it is followed by an under. For example, the pattern under, under, over, under, under has three runs. Also, a series such as over, under, over, under, under, over, over, under, under, under (exactly the results from 10 to 19 in the figure "The First-Two Digits of the Population of Journal Entries") has six runs.
In data that conforms almost perfectly to Benford's Law, we would expect 46 runs in the first-two digits graph. However, in journal entry and accounts payable data there is a tendency to have overs (spikes) at the multiples of 10 (10, 20, 30, ..., 90) because of our tendency to use round numbers for pricing services and for estimates. In the figure "The First-Two Digits of the Population of Journal Entries" there is an over at each of the multiples of 10. This tendency slightly reduces the number of runs in accounting data, making our practical expectation 43 (slightly fewer) runs. My experience with running these tests with internal auditors as a part of general ledger analytics indicated that journal entry data with 35 or fewer runs might include some combination of ridges and valleys. The graphs in the figure "Authentic Journal Entries With Added Fictitious Amounts" have 19 and 11 runs, respectively, which is way below the suggested critical value of 35 runs. The original data shown in the figure "The First-Two Digits of the Population of Journal Entries" had 37 runs, which is above the critical value and which correctly signals that there are no noticeable ridges or valleys. As a side issue, the conformity to Benford's Law has deteriorated, and the mean absolute deviations of the graphs in the figure "Authentic Journal Entries With Added Fictitious Amounts" are significantly higher than those of the authentic data.
The tighter the range ($2,000 to $4,999 in the simulations) of the fictitious entries, and the larger the percentage of fictitious entries in the population, the more effective Benford's Law-based audit data analytics will be at detecting a below-the-testing-threshold fraud scheme, provided that the follow-up audit work is done competently.
Benford's Law-based audit planning
AU-C Section 315, Understanding the Entity and Its Environment and Assessing the Risks of Material Misstatement, requires auditors to identify journal entries that might represent specific risks, including unusual transactions, events, amounts, ratios, and trends. Benford's Law-based audit analytics can also be used to identify unusual transactions in journal entries, as I demonstrate in the following example.
The data I used is a file of invoices processed for payment by an electricity supply company over a 16-month period. The file was made available to me for in-house training and for research purposes. There were 189,470 invoices processed for payment for a total of $490,277,625. About one-half of the total dollars went to pay the 370 invoices for invoice amounts of $100,000 and higher. The first-two digits of the invoice amounts are shown in the figure "The First-Two Digits of the Invoice Amounts," below.
Note that in this figure there is an extreme spike at 50, together with two noticeable twin spikes (overs), at 10 and 11, and at 98 and 99. In this application, I wanted to identify a selection of the largest spikes that would become actionable (or notable) spikes (spikes that an auditor should investigate).
To help with this selection I added a threshold line to the graph (see the figure "The First-Two Digits of the Invoice Amounts With a Threshold Line," below) above which a spike is statistically significant. For large datasets, setting a threshold line based on statistical significance will make it too tight (too close to the Benford's Law line) to be practical. My recent experience with journal entry populations has shown that fixing the threshold line based on an audit population of 2,000 records gives an ideal balance between doing too little audit work and missing potential irregularities and doing too much audit work chasing false positives. I therefore set the threshold line loosely based on statistical significance with a lenient adjustment upward because we expect accounting data to only closely approximate, as opposed to perfectly conform to, Benford's Law.
In a separate, unpublished analysis, I found that there was only a small chance that a spike caused by random variation in the data would peak above the threshold line. The result is that spikes above the threshold line are caused either by an irregularity or by a nonfraud issue such as the excessive use of round numbers.
I also concluded that analyzing the four largest spikes would be enough to detect the material data issues. Furthermore, if the four spikes all identified irregularities, then the auditor should either investigate more spikes or stop and reevaluate the effectiveness of the client's internal controls and the audit risk. In "The First-Two Digits of the Invoice Amounts With a Threshold Line," the actionable spikes, ranked by the Z-statistics, are 50, 11, 10, and 98. We would not investigate those cases where the actual proportion is below the Benford's Law line because every over essentially causes an under somewhere else because the actual proportions must sum to 1.00.
The audit work would thus concentrate on the reasons for the spikes at 50, 11, 10, and 98. To start, I selected (filtered) the transactions with first-two digits 50. Then I ran a duplications test to see which dollar amounts with 50 occurred most often and found in this case there were 6,022 transactions for $50 each. The company's explanation for this finding was that it required a deposit from new customers, and the $50 payments were refunds to customers that had canceled their service in the first year of its term. (As an aside: Experience has shown that there are almost always spikes at 50 in journal entry data because of our tendency to use those first-two digits for estimates and accruals, and auditors may want to spend less time investigating a spike at 50.)
The next duplications test showed that the spike at 11 was mainly caused by 2,263 invoices for exactly $1,153.35 from two vendors. The company confirmed that their usual practice, when items were needed six times a day, was to buy in bulk and issue from inventory, or to be invoiced for batches of the product. The spike at 98 was entirely caused by 1,010 invoices for exactly $988.35 from two vendors with many instances of multiple purchases per day, and the spike at 10 was caused by duplications of several different amounts all with first-two digits 10. None of these duplications were related to vendor fraud, although the possibility of duplicate-paying the invoices by mistake was real.
Some cautions are needed before using Benford's Law-based journal entry testing or general ledger analytics. Note that Benford's Law-based analysis is not appropriate for all types of journal entries (see the sidebar, "Where to Target Benford's Law").
Auditors might be tempted to remove the transactions that caused the actionable spikes, but they should not do so. In the above example, for instance, auditors may want to remove the $50 deposits, the invoices for $1,153.35, and other transactions related to the 10 and 98 spikes from the data, among others, and then rerun the Benford's Law tests. Doing this will be like playing whack-a-mole at a carnival because removing those transactions will simply cause other new spikes and the process will repeat itself.
If auditors get a result with, say, 10 or more spikes, they should reevaluate whether the data should have been expected to conform to Benford's Law in the first place. The first check would be to see whether an error had been made where, for instance, the ledger account numbers, foreign exchange rates, or dates (as a serial number) were analyzed instead of the dollar amounts of the journal entries. The second check would be to see whether a few types of journal entries dominate the data — for instance, recurring transactions such as depreciation, bank charges, or travel per diem amounts. If everything seems in order, after reviewing the results for the prior year, auditors might want to reevaluate the effectiveness of the client's internal controls and the audit risk.
Benford's Law-based journal entry or general ledger analytics testing is not suggested for datasets with fewer than 5,000 records. Unpublished statistical research of mine has shown that 5,000 is essentially the lowest number of records for which a first-two digits test on journal entries is effective. Datasets with fewer than 5,000 journal entry amounts have an elevated chance of producing 1, 2, 3, or more spikes due simply to random fluctuations unrelated to any irregularities (essentially false positives). Datasets with more than 5,000 journal entry amounts have only a small chance of producing a single spike that tops the threshold line by a small margin (also essentially a false positive, but a rare false positive).
Substantive analytical procedures are an efficient means of providing audit evidence because of the relative ease of obtaining the data and performing the calculations and comparisons. Running the Benford's Law-based tests on journal entry amounts is reasonably straightforward and can be done in, for example, Excel, R, or IDEA. Internal auditors could run these tests as a part of a continuous monitoring general ledger analytics application.
These tests will almost always bring to light patterns and insights that would otherwise have remained concealed. Note, however, that Benford's Law-based testing only provides an indicator of fraud or misstatements. When presented with a graph with ridges or spikes, auditors should find out whether there are valid business reasons (such as travel per diem amounts) that caused these deviations. If none are forthcoming from the client, then auditors should perform substantive tests of transactions or tests of details of balances to see whether dollar misstatements have occurred. Staff training should cover the documentation needed to explain a ridge or a spike and should ensure that the tests are run, and the follow-up work is consistent, across all offices.
Where to target Benford's Law
Not all types of journal entries are well suited to Benford's Law-based testing. Top-side journal entries are (usually) made in a spreadsheet after the consolidation is completed but before the financial statements are prepared. They do not appear as entries in the general ledger and are not subject to standard system controls. They are never formally posted to the ledger accounts of the subsidiaries. Top-side journal entries played a large role in the WorldCom fraud. They are significant audit risks and should be 100% tested.
System or automated journal entries are also not good candidates for Benford's Law-based testing. These journal entries flow from sales transactions, manufacturing production, and purchase transactions. These recurring entries record day-to-day activities. For example, when a product is finished and moves off the factory floor, an automated system will increase finished goods inventory and decrease work-in-process. These entries generally do not conform to Benford's Law, not because of fraud or error, but because a handful of transaction types might dominate the population. For instance, a courier service might have its most frequent "product" be overnight delivery at a $25.35 selling price. This product would cause the system to record perhaps a million journal entries for that amount each day, which would cause an excess of first-two digits 25s when compared with the expectations of Benford's Law.
The manual entries are well suited to Benford's Law-based testing. These entries are used for adjustments, accruals and prepayments, funds transfers, internal billings, cost allocations, reversals (voids), closing entries, and rare activities such as acquisitions. The manual entries are under the control of top management and are exactly what would be used for financial statement fraud.
About the author
Mark J. Nigrini, Ph.D., is an associate professor at West Virginia University in Morgantown, W.Va. His article “Lessons From an $8 Million Fraud,” JofA, Aug. 2014, co-written with Nathan J. Mueller, won the Lawler Award for best JofA article of 2014. To comment on this article or to suggest an idea for another article, contact Courtney Vien at Courtney.Vien@aicpa-cima.com.
"Journal Entry Testing Using Excel," JofA, Nov. 1, 2021
"Using Excel and Benford's Law to Detect Fraud," JofA, April 1, 2017
"Lessons Learned From a Multibillion-Dollar Fraud," JofA, Aug. 22, 2018
Fraud Prevention, Detection, and Response
This course describes the techniques typically employed to prevent, detect, and investigate fraud within the organization. Topics include the impact of fraud on business and society, common profiles of fraud perpetrators, types of fraud schemes, fraud triangle, risk issues, corporate governance, and fraud risk assessment and process controls.
In your next financial statement audit, apply the benefits of audit data analytics. We worked with the profession’s leading experts to show you how.
For more information or to make a purchase, go to aicpa.org/cpe-learning or call the Institute at 888-777-7077.