Using Excel and Benford’s Law to detect fraud

Learn the formulas, functions, and techniques that enable efficient Benford analysis of data sets.
By J. Carlton Collins, CPA

Using Excel and Benford’s Law to detect fraud
Image by by OlgaYakovenko/iStock

In 2015, Atlanta's Six Flags White Water theme park opened a new 10-story slide called the Dive Bomber. The ride begins with a virtual free fall nearly straight down before it gradually levels out toward the bottom. Most people who look at the Dive Bomber see a thrill ride—one they may or may not want to experience. CPAs who gaze upon the Dive Bomber may see something else—an image they could possibly use to detect and prevent fraud.

What could a water slide possibly have to do with fighting fraud? The answer lies not in the nature of the slide, but in its shape. The Dive Bomber's curve closely matches the curve Frank Benford discovered in 1938 that depicts mathematical expectations related to naturally occurring (or nonfabricated) numbers (see the sidebar, "The History of Benford's Law"). Affixing the shape of this water slide in your mind now may help you detect fraudulent data later. Though the merits of the analogy may not be immediately obvious, this article explains Benford's Law, how it relates to the shape of a water slide, and how to make simple Excel-based calculations that can help spot and stop fraudulent activity.

WHEN TO USE BENFORD'S LAW TO SPOT FRAUD

Briefly explained, Benford's Law maintains that the numeral 1 will be the leading digit in a genuine data set of numbers 30.1% of the time; the numeral 2 will be the leading digit 17.6% of the time; and each subsequent numeral, 3 through 9, will be the leading digit with decreasing frequency. This expected occurrence of leading digits can be illustrated as shown in the chart "Benford's Law."

Benford's Law


The resulting curve pictured in this green bar chart closely resembles a steep water slide and is sometimes referred to as the Benford curve. Today, armed with any version of Microsoft Excel, CPAs can count the leading digits contained in virtually any data set, chart the findings, and compare the results to Benford's curve to see if that data set obeys the expectations set forth by Benford's Law.

The Excel-based procedures described in this article for counting and charting a data set's leading digits are the same for any size data set and can include general ledgers, trial balance reports, income statements, balance sheets, invoice listings, inventory listings, depreciation schedules, investment statements, accounts payable and receivable reports, timesheet data, portfolios, expense reports, and virtually any other group of data containing naturally occurring numbers. A few caveats regarding the application of Benford's Law are as follows:

  • The larger the better. Benford's Law works better with larger sets of data. While the law has been shown to hold true for data sets containing as few as 50 to 100 numbers, some experts believe data sets of 500 or more numbers are better suited for this type of analysis.
  • Equal opportunity. To conform with the law, the data set you use must contain data in which each number 1 through 9 has an equal chance of being the leading digit. Otherwise, Benford's Law doesn't apply. For example, consider a listing of the heights of all the NBA basketball players in history. In this case, because NBA players range in height from 5 feet 3 inches (Muggsy Bogues, who played from 1987 to 2001) to 7 feet 7 inches (Manute Bol and Gheorghe Mureșan, who played during the 1980s and 1990s), there are no player heights that begin with a 1, 2, 3, 4, 8, or 9; hence those digits have no chance of being the first digit in such a listing, making Benford's Law inapplicable. Likewise, a listing of invoices for a web hosting company that charges its customers either $49.99 or $79.99 would be an unsuitable data set because the numerals 4 and 7 (or perhaps the numerals 5 and 8 if you include sales taxes) would have a greater opportunity to be the leading digits in such a listing.
  • No definitive proof. Benford's Law calculations can never definitively prove or disprove the presence or absence of genuine numbers. However, some CPAs attempt to establish mathematical thresholds (e.g., by using z-score calculations) to definitively measure and interpret Benford's Law calculations. In my opinion, such thresholds don't make Benford calculations any more (or less) accurate; they merely help to lessen the subjective interpretation of whether a resulting chart approximates a Benford curve. In my opinion, a simple visual examination of the resulting chart is all it really takes to raise questions or concerns with the data—if the chart doesn't closely follow Benford's curve, then consideration should be given to scrutinizing the data more carefully.

HOW TO APPLY BENFORD'S LAW IN EXCEL

The three basic Excel tools needed to apply Benford's Law are the LEFT and COUNTIF functions, along with the Column Chart tool. Let's go through a simplified example demonstrating this process.

Simplified example using Excel. For this example, we will examine the populations of the world's 258 countries from 2011 through 2015 as reported by the World Bank Group's World DataBank (databank.worldbank.org), using United Nations World Population Prospects data (see the sidebar "Data Source Citation" for the full source information for the data). The first step is to extract the first digit of each population number using the LEFT function (see the screenshot "Extracting the First Digit"). As pictured in cell K2, the function formula =LEFT(F2,1) reads the population in cell F2 (32,526,562 in this example) and returns the first digit of that number (the digit 3 in this example). This simple formula is then copied across and down to extract the first digits for all populations (columns G through K in this example).

Extracting the first digit

Extracting the first digit


The next step is to count the occurrence of each number 1 through 9 within the extracted digits using the =COUNTIF function (see the screenshot "Applying the COUNTIF Function"). This is achieved by numbering a range of cells 1 through 9 (as shown in cells M2 through M10), entering into cell N2 the formula =COUNTIF($G$2:$K$259,M2), and then copying that formula down to cell N10.

Applying the COUNTIF function

Applying the COUNTIF function


In this example, we see that the numeral 1 occurs 318 times; the numeral 2 occurs 174 times; the numeral 3 occurs 162 times; and so on. Charting these occurrences produces the results in the chart "World Populations by Country."

World populations by country


As you can see in the chart above, even with this relatively small data set, the results do roughly (but not exactly) follow Benford's curve. (While you should never expect the results to exactly match Benford's curve, you should expect the curve produced by larger sets of data to match Benford's curve more closely than in this ­relatively small data set example.) As a result, we can conclude that this Benford analysis tends to verify the populations as genuine numbers that have not been fabricated.

You can download an Excel workbook containing the data solutions described above here. At the bottom of this page, you can watch a short video demonstrating the procedures discussed in this article.

ANALYZING A GENERAL LEDGER

As a larger example, I opened my Excel file containing 20 years of my personal general ledger activity (26,879 numbers) and applied the procedures described above. (This is the same general ledger data used to produce an Excel PivotTable and Treemap in the January 2017 JofA article "Data Mining Your General Ledger With Excel.") I repeated the process described above and created the chart "20-Year General Ledger."

20-year general ledger


As you can see in this example, the data set contains a frequency of numerals 1 through 9 that produces a Benford-like curve, with the exception that the numeral 9 occurs slightly more frequently than anticipated. This Benford analysis of my general ledger took less than two minutes, indicating that it requires minimal time and effort.

Why this works. Using a nontechnical description, Benford's Law works whether you are counting dollars, acres, inventory, populations, or anything, because you must first count 1 before counting 2, 3, or 4; you must first count 10 before counting 20, 30, or 40; you must count 100 before counting 200, 300, or 400; and so on. Every counting job starts with lower numbers, but not all counting jobs progress to include the increasingly higher numbers. As an example, when counting to 25, 11 numbers have a leading digit of 1, seven numbers have the leading digit of 2, and only one number leads with the digit 3. Even in this simplified counting exercise, you can see Benford's Law at work—rather than each digit having an equal chance at being the first digit, lower numerals always have a greater chance at being the leading digit compared to higher numerals.

Scientifically, Benford's Law is based on base-10 logarithms that show the probability that the leading digit of a number will be n can be calculated as log10(1+1/n). By substituting the numbers 1 through 9 for n, you can calculate that each subsequent number 1 through 9 has a diminishing probability that it will be the leading digit.

HOW TO SPOT POSSIBLE FRAUD

To illustrate what a bar chart based on fraudulent or fabricated numbers might look like, I replaced the country population figures (from the earlier World DataBank example) with computer-generated random numbers by entering the formula =RAND()*10000 in place of each population number. The result is the chart "Computer-Generated Numbers."

Computer-generated numbers


As you can see in this chart, the top of the bars do not produce anything close to a Benford curve, and this straight-line result tends to repeat even when the random numbers are recalculated multiple times (by pressing the F9 key). This suggests these data were artificially produced, which they were using a standard computerized random number generator program, whereby each numeral 1 through 9 has an equal chance of being the leading digit. If the data you analyze produce a chart with bars of approximately the same height, this suggests the underlying data may be fabricated.

Continuing our analysis of fabricated data, let's look at what the chart might look like if someone uses his or her computer keyboard's horizontal number keys to create fictitious data by typing randomly. In this case, it is reasonable to assume that the more dominant index and middle fingers of the person typing would press the 4, 5, 6, and 7 keys more frequently than the pinkie and ring fingers would press the 1, 2, 8, and 9 keys (as shown below).

Keyboard


In this case, we would expect to see a chart with a bell-shaped curve similar to the yellow bar chart "Manually Generated Numbers Using a Computer Keyboard."

Similarly, if a person uses a computer's number keypad to create random numbers, then the results will likely vary based on many factors. For example, the user's index and middle finger may dominantly type 1s, 4s, and 7s or 2s, 5s, and 8s more than the ring finger types 3s, 6s, and 9s. Or perhaps the user might type more middle-row numbers (4s, 5s, and 6s) compared to the top or bottom row numbers. There is also the possibility that the user may simply type random numbers fairly evenly. In any of these events, there is little reason to believe that any fabricated numbers created on a keyboard's number pad would produce a frequency of numerals that, when charted, would produce a Benford-shaped curve.

Manually generated numbers using a computer keyboard


Even if a person fabricates numbers mentally (using his or her brain rather than a computer), there is little reason to believe such a mental exercise would produce results that adhere closely to a Benford's curve. It is more likely that the person producing numbers mentally would tend to repeat certain patterns, and charting the frequency of the resulting leading digits might reveal those patterns. For example, a person may subconsciously overuse the digits 1, 3, and 4 to produce false data, and underuse the digits 6 and 8. If so, such anomalies would manifest themselves by producing an erratic bar chart that bears little resemblance to Benford's curve.

As another example, a bookkeeper writing fictitious checks may intentionally keep the check amounts below the company's $500 or $1,000 authorization thresholds, and therefore an analysis of those check amounts might show the numbers 4 and 9 occurring more frequently as the leading digits than Benford's Law would predict.

GRADING ON THE CURVE

The results obtained using Benford's Law analysis should not be considered definitive; the process of counting leading digits will never decidedly prove the absence or presence of fraud. The results obtained from this process are merely an analytical tool that may help the CPA gauge whether additional investigative work is warranted. However, when Benford's curve fails to materialize, CPAs should step up their efforts to verify the data, as follows.

1. Reconsider the data's suitability for Benford analysis. Before suspecting fraud, CPAs should reexamine the data set for the possibility of built-in bias toward certain numerals, and if bias is found, the results of your Benford analysis should be disregarded.

2. Apply analytical review procedures. If Benford's predictions do not hold true for a given data set, you should suspect an anomaly and seek additional assurances that the data set is indeed valid. AU-C Section 520, Analytical Procedures, provides guidance for conducting an analytical review and briefly dictates the following. The CPA should:

  • Consider whether specific unusual transactions or events, accounting changes, business changes, random fluctuations, or misstatements may have impacted the data set.
  • Perform a test of transactions to verify the data set. For example, select a sample of data and physically trace the numbers to supporting documentation.
  • Compare the data set to prior- or previous-year data sets. Investigate significant differences.
  • Compare the data set to budgeted or expected amounts, if any. Investigate significant differences.
  • Analyze the data set using ratios or relationships, and compare those results to expected ratio results or industry averages. As examples, financial ratios may be revealing when dealing with financial data sets and square-footage calculations, or per-hour measurements or per-mileage measurements may be revealing when dealing with statistical data sets.
  • Consider using positive, rather than negative, confirmations to verify vendor and customer balances.
  • If standard analytical procedures have been applied and produced no presence of fraud, but the data sets involved skew significantly from Benford's expectations, consider expanding your analytical procedures to include larger-than-normal sample sizes and tightening standard-deviation calculations by using higher levels of confidence.
  • If the data set involves inventory of any kind, perhaps the physical inspection of a sampling of these inventories is in order. AU-C Section 501, Audit Evidence—Specific Considerations for Selected Items, provides guidance for further investigating inventories.

3. Rethink internal controls. Consider whether reliable controls are in place to detect or prevent improprieties.

4. Consider the source. Reconsider the source where the data were obtained. Were they produced internally or obtained from an outside source? If from an outside source, inquire about the measures that source used to verify its data.

Another tool for fighting fraud. AU-C Section 240, Consideration of Fraud in a Financial Statement Audit, requires auditors to employ analytical procedures to help detect the existence of unusual transactions or potential fraud. To that end, CPAs are on a constant lookout for new methods and procedures that can help them detect and prevent fraud. As it turns out, fabricating a set of falsified data that conforms to Benford's Law is a difficult proposition, and many would-be fraudsters are likely unaware of Benford's Law or how to construct fraudulent data that abide by its rules. Therefore, this Excel-based Benford's Law analysis will likely be a handy addition to any CPAs arsenal of fraud detection tools.


The history of Benford's Law

The story of Benford's Law begins in 1881, when astronomer Simon Newcomb noticed that the page numbers in a book of logarithm tables were worn (or smeared) more toward the front of the book and progressively less worn toward the end of the book. Where others would simply dismiss the worn page numbers, Newcomb recognized a distinct pattern related to the occurrence of lower versus higher numbers. He published an article explaining his observations and postulated that the probability of a single number n being the first digit of a number was equal to log(n+1) − log(n). Fifty-seven years later, in 1938, physicist Frank Benford tested Newcomb's hypothesis against 20 sets of data and published a scholarly paper verifying the law. Despite Newcomb's groundwork, Benford has garnered much of the credit for the discovery now commonly referred to as Benford's Law.

The application of Benford's Law to spot signs of accounting fraud grew out of an article published in 1972 by economist Hal Varian, who wrote that Benford's Law might be used to detect the possibility of fraud in socioeconomic data submitted in support of various public planning decisions. Varian's general idea was that a simple comparison of first-digit frequency distributions ought to reveal anomalous results (if any), per Benford's Law. In 1999, a JofA article ("I've Got Your Number," May 1999), written by Mark J. Nigrini introduced how forensic accountants and auditors could apply Benford's Law to search for indicators of potential accounting and expenses fraud.

Benford attempted to explain his law by saying that "it's easier to own one acre than nine acres," implying (perhaps) that when people purchase land, it is reasonable to assume that more people purchase one acre as a starting point, rather than nine acres as their starting point.


About the author

J. Carlton Collins (carlton@asaresearch.com) is a technology consultant, a conference speaker, and a JofA contributing editor.

To comment on this article or to suggest an idea for another article, contact Jeff Drew, senior editor, at Jeff.Drew@aicpa-cima.com or 919-402-4056.

Editor's note

Instructions for Microsoft Excel in this article refer to the 2007 through 2016 versions, unless otherwise specified.


AICPA resources

Articles

CPE self-study

  • Fraud Prevention, Detection, and Response (#159966, online access; #GT-FA.LFH.EL, group training)

Conference

  • Practitioners Symposium and Tech+ Conference at ENGAGE, June 12—15, Las Vegas

For more information or to make a purchase or register, go to aicpastore.com or call the Institute at 888-777-7077.

Information Management and Technology Assurance (IMTA) Section and CITP credential

The Information Management and Technology Assurance (IMTA) division serves members of the IMTA Membership Section, CPAs who hold the Certified Information Technology Professional (CITP) credential, other AICPA members, and accounting professionals who want to maximize information technology to provide information management and/or technology assurance services to meet their clients' or organization's operational, compliance, and assurance needs. To learn about the IMTA division, visit aicpa.org/IMTA. Information about the CITP credential is available at aicpa.org/CITP.


Data source citation

The total population data used in this article and the associated Excel file and video are used with permission from the World Bank's World Databank. Total population data are based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates. Sources of data used by the World Bank include (1) United Nations Population Division, World Population Prospects; (2) census reports and other statistical publications from national statistical offices; (3) Eurostat: Demographic Statistics; (4) United Nations Statistical Division, Population and Vital Statistics Report (various years); (5) U.S. Census Bureau: International Database; and (6) Secretariat of the Pacific Community: Statistics and Demography Programme.

SPONSORED REPORT

CPEOs provide peace of mind around payroll services

The creation of these new IRS-certified service providers for small businesses clarifies some issues around traditional professional employer organizations.

QUIZ

8 sentences to help you master subject-verb agreement

When professionals prepare written material for readers inside their organization or outside, they should make sure that no errors distract from the message they need to convey. Take this short quiz for practice in subject-verb agreement.