Trying to uncover evidence of fraud in a data set of millions of records is somewhat akin to searching for a needle in a haystack. Fortunately, the successful employment of data analysis techniques can clear away most of the “hay” and leave the fraud examiner or auditor with a much smaller stack to dig through. Do you know how to effectively analyze data for the red flags of fraud? Are you making the most of data analysis in your fraud detection efforts? Take this Fraud IQ quiz and find out.
1. One of the main benefits of using data analysis techniques to detect fraud is that:
a. They can provide insight into the details of how a fraud occurred.
b. They can be used to establish predication for a full fraud examination.
c. They are easily performed using off-the-shelf tools that enable anyone to undertake an in-depth analysis without specific technological knowledge.
d. They can take the place of the fraud risk assessment in identifying key areas of fraud risk within the organization.
2. Which is the most effective order of steps in the data analysis process?
a. Build a profile of potential frauds; obtain the data; verify the data; cleanse the data; analyze the data.
b. Obtain the data; cleanse the data; analyze the data; verify the data; build a profile of potential frauds.
c. Obtain the data; analyze the data; cleanse the data; verify the data; build a profile of potential frauds.
d. Build a profile of potential frauds; verify the data; obtain the data; cleanse the data; analyze the data.
3. Mitch, a CPA, is attempting to sort through a very large amount of production data for his organization’s various manufacturing plants to identify anomalies that might indicate fraudulent activity. Which of the following techniques would be most helpful for Mitch in establishing expected values for data in a population?
a. Fuzzy logic matching.
b. Gap testing.
c. Compliance verification.
d. Regression analysis.
4. The audit department of a large financial institution received a tip that a few employees have been colluding to siphon off customer data and sell it to an organized crime ring for use in identity theft schemes. To help identify whether such a scheme is occurring and who might be involved, the internal audit team decides to employ textual analytics techniques to interemployee communications. Which of the following combinations of keywords or phrases likely would be the most helpful in identifying communications between the data thieves regarding their scheme?
a. “Confidential” and “customer information.”
b. “Confidential”; “unauthorized copying”; and “nobody will notice.”
c. “Nobody will notice” and “not hurting anyone.”
d. “Unauthorized copying” and “customer information.”
5. Diana, a CPA, is the controller for Square Box Co. She receives a call from Joshua, the accounts receivable manager of Circle Corp., one of Square Box’s vendors, regarding numerous double payments received from Square Box during the past six months. Joshua says he does not understand why the invoices are being paid twice and that he would like to get the situation straightened out to avoid having to continue issuing refund checks to Square Box. He also says that he usually deals with Amanda, Square Box’s accounts payable manager, but that she has not been able to curb the situation, so he thought he would try taking it to her boss. After hanging up the phone, Diana pulls up Circle Corp.’s accounts payable history to try to figure out what’s going on, but she sees no sign of duplicate payments or refunds. Growing concerned, she decides to run some data analytics tests on payments to vendors to see if she can find any other anomalies. Which of the following fields would be LEAST helpful in searching for clues regarding duplicate payments to vendors in Square Box’s accounting system?
a. Vendor address.
b. Vendor number.
c. Invoice number.
d. Payment amount.
6. For which of the following data sets would a Benford’s Law analysis be LEAST appropriate?
a. Employee hourly wage rates.
b. Customer balances.
c. Expense reimbursement claims.
d. Inventory prices.
7. Analyzing data using Robert Gunning’s Fog Index is most useful in uncovering which of the following fraud schemes?
a. Kickbacks paid to overseas vendors.
b. Financial statement manipulation.
c. Theft of proprietary information.
d. Skimming of incoming cash receipts.
8. Link analysis and geospatial analysis can be particularly useful in uncovering red flags of which of the following types of schemes?
c. False billing.
d. Financial statement manipulation.
9. The audit team for Shady Business Inc. is applying data analytics techniques to help identify areas where fraud might be occurring. In which of the following situations would examining the ratio of maximum values to minimum values within a data set be most useful?
a. Amount of raw materials on hand by part number.
b. Net payroll check by employee.
c. Unit prices paid for a product by purchase transaction.
d. Total quantity of products purchased by customer.
10. The results of the fraud risk assessment for XYZ Corp. indicate that the risk of fraud in the company’s purchasing function is high due to frequent turnover in the department and several other control weaknesses. In response, XYZ’s audit team decides to employ data analysis tests on the purchasing system to identify internal control breaches and anomalies that might indicate fraud. The audit team begins by extracting payments to vendors that lack required information in the vendor master file; this test returns thousands of transactions as exceptions. Which of the following procedures would be LEAST helpful in reducing the number of false positives included in the analysis results?
a. Combining multiple data analysis tests and weighting the results based on number of tests for which each record returns an exception.
b. Supplementing the results with consideration of known behavioral red flags, such as financial difficulties or unusually close associations with vendors, displayed by particular employees in the purchasing function.
c. Combining the data in the system with data from outside sources, such as industry codes of vendors.
d. Filtering the results to include only those transactions recorded during normal business hours by employees with access to the purchasing system.
1. (b) Data analysis techniques provide a means to explore specific areas for evidence of potential fraud without undertaking formal investigation procedures. As such, these techniques are an effective way to help establish—or disprove—predication for a fraud examination. (Predication is the totality of circumstances that would lead a reasonable, prudent, and professionally trained person to believe a fraud has occurred, is occurring, or will occur.)
For example, a hotline tip is received from an employee claiming that another employee, with whom the caller has a history of personal conflict, is embezzling company funds. Provided the company has granted the auditors (or whoever is in charge of responding to the tip) access to its financial records, data analysis procedures generally can be used to explore the financial data relative to the reported embezzlement without alerting anyone involved. If, as a result of the analysis, anomalies are detected and predication is confirmed, more formal investigation procedures, such as interviews, can commence.
However, undertaking data analysis requires prudent consideration of many issues involved. Those involved in the process must have a thorough understanding of the data itself and the software involved in housing and analyzing it. This often requires working closely with information technology experts to ensure that the data is acquired and analyzed in a sound manner. Additionally, data analysis procedures should be closely tied to the results of the organization’s fraud risk assessment to ensure that the approach is efficient and based on the organization’s true risks and operations. Even with this foundational knowledge, however, those performing data analysis engagements must know that the anomalies identified do not, in themselves, indicate fraudulent activity. Instead, they illuminate outliers in the data that might—or might not—be the result of fraud but that should be followed up with additional procedures to determine their legitimacy.
2. (a) Although the core of data analysis involves running targeted tests on data to identify anomalies, the ability of such tests to help detect fraud depends greatly on what the fraud examiner does before and after actually performing the data analysis techniques. Consequently, to ensure the most accurate and meaningful results, examiners should employ a formal data analysis process that begins several steps before the tests are run and concludes with active and ongoing review of the data. While the specific process will vary based on the realities and needs of the organization, the following approach contains steps that should be considered and implemented, to the appropriate extent, in searching for anomalies that might indicate fraud:
A. Planning phase:
i. Understand the data and the data environment.
ii. Articulate examination objectives.
iii. Build a profile of potential frauds.
iv. Determine whether predication exists.
B. Preparation phase:
i. Identify the relevant data.
ii. Obtain the data.
iii. Verify the data.
iv. Cleanse and transform the data.
C. Testing and interpretation phase:
i. Analyze the data.
D. Post-analysis phase:
i. Respond to the analysis findings.
ii. Monitor the data.
3. (d) Regression analysis, also called correlation analysis, is a statistical technique that uses a series of records to create a model relationship between a dependent variable and one or more independent variables. For example, regression analysis could be used to model and predict the number of widgets manufactured based on amounts of materials and labor used, maintenance costs, utilities, and other related factors. Mitch then could use the resulting model to identify anomalies in the data. A period or facility in which reported production output is significantly lower or higher than predicted based on this model would merit further examination.
4. (c) As with other forms of data analysis, the objective of using textual analysis on nonstructured data, such as emails and other text, is to narrow down an enormous data population to a smaller group that meets the specified criteria and can be examined further for signs of fraud. Consequently, in determining keywords to use, the audit team must avoid words that would result in a huge number of false positives. For example, the words “confidential” and “unauthorized copying” appear in many individuals’ email signatures as part of a standard warning about unintended recipients. Consequently, including such terms in a search likely would result in thousands of emails—or more—that are unrelated to any fraud. Similarly, the term “customer information” likely also will appear in a vast number of legitimate communications. Instead, focusing on words that might indicate the mindset or motives of a scheme—terms such as “nobody will notice” or “not hurting anyone” and variations of those phrases—will yield fewer and more meaningful search results. Identifying patterns within those communications that contain such phrases, such as employees with an unusually high occurrence of the keywords or particular dates and times when the use of the words or phrases appears to spike, can help the audit team direct its subsequent examination activities.
5. (b) Diana did not see any signs of duplicate payments on Circle Corp.’s vendor account in the company’s system, indicating that the duplicate payments likely were made under another vendor’s account. However, the payments were made for the same amounts, received at the same address, and noted by Circle Corp. as being applied to the same invoice as the legitimate payments on Circle Corp.’s account. Consequently, searching for duplicates in each of these fields, as well as the vendor name field, should provide some information on how these payments were recorded in Square Box’s accounting system. To help address the risk of slight variations of duplicate information, fuzzy logic searching (which can help identify records with similar or potentially duplicate—though not identical—values, such as First Street, First St., and 1st St.) should be used. Conversely, Square Box’s system, like most accounting systems, does not allow for duplicates in the primary key fields—which, for vendor records, is the vendor number. So running a duplicate check on this field is unlikely to yield useful information.
6. (a) Benford’s Law states that within many large data sets, such as corporate sales statistics or U.S. city populations, the distribution of digits follows an unequal but consistent pattern. For example, the first digit of a multidigit number is 1 approximately 30% of the time—far more than the expected frequency of one out of nine. The likelihood decreases for each digit from 2 to 9, which is the first digit only 4.6% of the time. Predictable patterns also occur in the second and third digits of multidigit numbers. Applying Benford’s Law to a data set can help identify numbers that have been manipulated as part of a fraud scheme, as most fraudsters’ concealment efforts will result in the data not conforming to the law’s expected digit distribution. However, Benford’s Law is applicable only to randomly generated numeric data. Consequently, such an analysis will not yield meaningful results if used on data with preassigned digits (such as invoice numbers) or data confined to a predetermined range (such as hourly wage rates).
7. (b) The notes to a company’s financial statements are notoriously difficult to decipher, particularly for readers without a financial or accounting background or education. Consequently, the notes can be an excellent candidate for fraudulent manipulation by management. One tool for assessing the readability of the notes to financial statements is the Fog Index developed by Robert Gunning. The Fog Index uses an algorithm to measure the readability of a sample of English writing; the score that results from the calculation represents the number of years of formal education needed to understand the text upon an initial reading. Because notes to financial statements are inherently complex, it is not surprising that many receive a Fog Index score well beyond what would be considered easily readable by almost anyone—including their intended audience. (Test the Fog Index at gunning-fog-index.com.) Therefore, a high Fog Index alone is not necessarily an indicator of fraudulent activity. The real value in applying the Fog Index to financial statement fraud detection lies in using the index to make comparisons between particular notes within the same period, to similar notes in other periods, or to the notes of other organizations in the same industry. Any significant changes or deviations in a Fog Index score that are highlighted by these types of comparisons could indicate fraudulent activity and warrant a closer look.
8. (a) Link analysis and geospatial analysis can both be particularly useful in uncovering corruption schemes, such as bribery and conflicts of interest—schemes that often involve off-book aspects that typically make them among the most difficult frauds to detect. Link analysis provides visual representations (such as charts with lines showing connections) of data from multiple sources to discover communications, locations, patterns, trends, associations, relationships, and hidden networks. For example, link analysis can be used to demonstrate complex networks of parties and uncover indirect relationships, including those connected through several intermediaries. Similarly, geospatial analysis provides a visual model of the geographical locations of transactions, assets, customers, vendors, or other data. Using such an analysis to examine cash disbursements in certain regions where bribes are prominent can provide insight into potential corruption schemes. (Transparency International’s Corruption Perception Index is a good source for determining such regions.)
9. (c) Calculating the ratio of maximum values to minimum values can provide quick, high-level visibility into a data set for which a small range of values would be expected. Specifically, a maximum-to-minimum ratio close to 1 indicates that there is not much variance between the highest and lowest number in a data set. Such a calculation would be useful in examining unit prices for product purchases, since large ratios would highlight large variations in the price paid for the product—and possibly instances of being overcharged, which might be the result of a kickback scheme involving a purchasing employee.
10. (d) Limiting the number of false positive results is one of the biggest challenges in effectively designing and using data analysis techniques to detect fraud. A test that provides thousands of exceptions can be useful in identifying control weaknesses or unenforced policies, but is less helpful in detecting specific transactions that are part of a fraud scheme. To reduce the need to sift through a huge number of transactions, the data analysis team can combine several analysis techniques—e.g., payments to vendors with incomplete profiles, payments just below approval thresholds, and payments made unusually soon after the invoice date—and weight the results by the number of exceptions each record shows. In this situation, a transaction showing as an exception for all three tests would merit closer scrutiny than a transaction that appeared in the results of just one of these tests. Similarly, examining a transaction through the lens of its circumstances—particularly, who is involved and what nondata red flags might be present—can also be helpful. If the data analysis team can identify employees with known fraud risk factors (such as purchasing employees who have unusually close relationships with vendors or who are known to live beyond their means), giving close attention to their role in any transactions that come up during the data analysis process can help focus the team’s efforts on areas with increased risk. Further, combining the data in the purchasing system with external sources of information—such as vendor industry codes to identify payments outside legitimate purchase categories or maps to identify vendors with addresses in residential areas—can also be helpful.
In contrast, filtering the transactions to include only those recorded by expected employees during the expected times would be counterproductive to reducing false positives. On the contrary, running tests specifically to identify transactions recorded during nonbusiness hours (e.g., nights, weekends, and holidays) or recorded by employees who should not be involved in such transactions can help illuminate internal control breaches and potentially fraudulent transactions.
If you answered nine or 10 questions correctly, congratulations. Your solid knowledge of data analysis will assist you in detecting fraud for your clients or employer.
If you answered seven or eight questions correctly, you’re on the right track. Continue to build your knowledge of data analytics to help uncover the red flags of fraud in the data.
If you answered fewer than seven questions correctly, consider strengthening your understanding of data analysis and fraud detection concepts to help ensure that you have what it takes to stay one step ahead of fraud perpetrators.
Andi McNeal ( firstname.lastname@example.org ) is director of research for the Association of Certified Fraud Examiners.
To comment on this article or to suggest an idea for another article, contact Jeff Drew, senior editor, at email@example.com or 919-402-4056.
- “What’s Your Privacy IQ?” Sept. 2012, page 38
- “Criminal Minds: What CPAs Can Learn From the Way Thieves Think,” Aug. 2012, page 26
- “What’s Your Fraud IQ?” Aug. 2012, page 32
- “Small Businesses, Big Risk,” Aug. 2012, page 38
- “What’s Your Fraud IQ?” May 2012, page 44
- “What CPAs Need to Know About Organized Crime,” April 2012, page 38
- “What’s Your Fraud IQ?” Feb. 2012, page 36