Q: We frequently need to select samples of populations for test of transaction work when auditing or reviewing client financial data. Currently, each staff member uses his or her own sampling method, some of which seem to produce skewed results. We would like to deploy a consistent sampling method using Excel, but we are frequently faced with producing a single sample from multiple populations of different sizes, and we want to produce unbiased samples. Can you tell us how we might accomplish this in Excel?
A: It is possible to produce an unbiased sample from multiple population ranges in Excel using either the Rand or Randbetween functions, as follows. For both examples, let us assume that you want to select 30 random invoices from the following three population ranges—invoices numbered 2232 through 3723; 4000 through 5709; and 6400 through 8727.
Start by listing all invoice numbers in a single column using Excel’s Fill tool, as follows. Enter the value 2232 into Excel (cell A4 in this example), and then from the Home tab, select Fill, Series, and in the Series dialog box select the Columns radio button, enter 3723 as the Stop value, and then click OK.
This process will list all invoices from 2232 to 3723. Position your cursor in the next available cell in this column (cell A1496 in this example), enter the next invoice number in your population (4000 in this example), and repeat the Fill tool process until you have listed all three ranges of invoice numbers to be sampled.
In the adjacent column enter the function =RAND() (cell B4 in this
example), and copy it downward so that each invoice number has its own
corresponding random number, as pictured below.
To prevent the values from recalculating further, convert the random numbers to values by selecting them, then copy and paste them to the same location as Values. Sort the data according to the random numbers and select the first 30 numbers from the top of the list as your random sample (shown below).
For an approach that is a little more complicated but takes far less Excel screen real estate, use the Randbetween function as follows. ( Note: To use Randbetween in Excel 2003, you must first activate the Analysis ToolPak add-in by selecting Tools, Add-ins, checking the box labeled Analysis ToolPak, and then clicking OK. Because Randbetween is included as a standard function in later editions of Excel, the Analysis ToolPak does not need to be activated.)
List your various population ranges, calculate the population sizes
for each range and in total, and then calculate the percentage of the
total population for each range, as pictured.
Next, create a columnar table for summarizing the 30 random invoice numbers, and in the first column (cell H3 in this example) enter the formula =RANDBETWEEN(1,100)/100, and then copy it downward to fill 30 rows, as suggested in the image below.
These formulas will generate 30 random values between 1% and 100%, to be used in the next step as a basis for randomly selecting invoice numbers from the three populations, according to each range’s respective size within the total population. (The general idea, for example, is that values from a range representing just 10% of the total population will have only a 10% chance of being selected.)
In the adjacent column (cell I3 in this example), enter the following formula:
While this formula may appear intimidating, it simply contains three Randbetween functions, each designed to generate random invoices numbers from one of the three population ranges, depending on the random numbers (from 1% to 100%) generated in column H.
For example, as shown below, the formula in cell H3 produces the random percentage of 8%. Based on this result, the Randbetween formula in cell I3 generates a random invoice number from within the first population range (2232 and 3723), because 8% falls within the 1% to 27% range.
In theory, cell H3 can be expected to generate random numbers between 1% and 27% twenty-seven percent of the time. This means there is a 27% likelihood that the random invoice number generated in cell I3 will be generated from within the first range of invoice numbers (which represents 27% of the total population). Likewise, the formula ensures that invoice numbers will be generated from within the second and third ranges of data 31% and 42% of the time, respectively—hence, all values within the total population have an equal chance of being selected.
Copy the Randbetween formula downward to complete the table (shown below), and then convert the entire table of random numbers to values by selecting them, then copying and pasting them to a new location as values.
Although this second method is more complicated, once it is created, the Excel template can be used repeatedly by entering the population ranges in the yellow highlighted cells. Download this workbook at carltoncollins.com/random.xlsx.
Note: To select a sample of random items from a list of text values (e.g., customer names) or a list of nonsequentially numbered items, list the population in Excel and number them sequentially in an adjacent column, then apply one of the methods described above to the numbered column.