A while ago I wrote a routine using the .Find method to find missing numbers in a sequence. So if we had 1, 3, 5, the missing numbers are 2 and 4.
I had some feedback that as the sequence grew longer, the .Find method became increasingly slower, to the point that your hair grew quicker.
I am surprised to hear this as I’d expect a routine provided by Microsoft to be as least as fast as anything us mortals could write.
I chose to use .Find really for its ease of use, I wouldn’t have to write any loops to check each number. But when the length of time it takes to list missing numbers becomes frustratingly long, it’s time to look for another way.
One of our readers provided some code (thanks Bryan) where he compared two ways of listing missing numbers in a sequence, the first using .Find and the second using loops to check a collection created from the number sequence.
What the code revealed was that by the time you got to 10,0000 numbers in the sequence, the difference between the two approaches was already noticeable .Find was taking 50+ seconds, whereas looping through a collection took less than 2 seconds.
I then received a message from another blog reader, Dennis, saying he was trying to find missing numbers in a 90,000+ long sequence of numbers, with .Find taking over half an hour
So, I thought I’d better check this out myself and thanks to the code Bryan provided I came up with this.
What The Code Does
- Find missing numbers (integers) in a sequence
- The lowest number in the sequence is the starting point, the highest number is the end point
- Can include negative integers
- Numbers can be repeated
- No sorting of the sequence is required
- The sequence (input range) can contain empty cells
- The input range can be non-contiguous
The purpose of this testing was to compare the .Find method against an approach looping through a collection created from the range of cells containing the number sequence.
Whether or not this approach is totally scientific isn’t really important, what is important is that the two approaches are compared in the same way. Apples with apples.
The most important thing is what you, or the user, experiences.
If one approach is lightning fast and the other is slower than a snail going uphill, then we don’t really need to make sure our testing is accurate down to the microsecond.
That said, the testing routines use a function called MicroTimer, provided by Microsoft for accurate timing.
It uses Windows API calls to the computer's high-resolution timer and is faster and more accurate than the VBA Time function. It can measure intervals down to microseconds.
The results of these tests will be different each time you run them. Things like the number of missing numbers, whether you are using Debug.Print or not, and other processes running on your computer will mean each test will be slightly quicker or slower than the last one.
But to give you an idea of the variation I’m talking about, in four runs of the Collection routine with 10,000 numbers the code executed in 1.73s, 1.78s, 1.75s and 1.70s. So unless you are The Flash, you won’t even notice this.
The testing code was run in Excel 2013 and 2010, and each routine was run four times. Firstly with a sequence of 10 numbers, then 100, 1 000, 10 000 and finally 100 000 numbers in the sequence. The execution times for the 4 runs was averaged.
Actually that’s a bit of a lie. By the time I got to 100,000 numbers in the sequence the .Find method was taking so long I just ran it twice. But I did test the others four times each.
The Testing Routines
The testing code uses the RANDBETWEEN function to randomly create numbers between 1 and the largest number in the sequence (10, 100, 1 000, 10 000 or 100 000) in a column starting from A1.
Because RANDBETWEEN is volatile, if we wrote the missing numbers to the worksheet, RANDBETWEEN would recalculate each time we did this, changing the numbers in our sequence and this would mess up our test.
So the missing numbers, and any other output we want, is written to the Immediate window in the VB editor using Debug.Print. To show the Immediate window in the VB Editor either press CTRL+G or go to the View menu ->Immediate Window
With only 10, 100 or 1000 numbers in the sequence, the difference between .Find and the collection/loops routine isn’t perceptible, all the results coming back in under a second.
But with 10,000 numbers in the sequence, .Find takes about 51 seconds, the collection/loop code finds all the missing numbers in less than 2 seconds!
With 100,000 the difference is even more staggering with .Find taking more than an hour and 20 minutes to find all the missing numbers in the sequence. The collection/loop code did it in under 9 seconds.
I was amazed.
I don’t know what is going on in the .Find method but beyond 1000 numbers in a sequence, it seems to be incredibly slow at this particular job. .Find is very good if you want to do a one-off location of a single thing within a range, but for this particular application, looping is by far the better approach.
Below is a table of results (measured in seconds) for each of the tests starting with 10 numbers in a sequence and working through to 100 000 numbers.
The Finished Code
After the testing I had to modify the code so that it gave us results we could use in the workbook, not just printing to the Immediate Window. All the code can be found in this workbook.
When you open the VB Editor (ALT+F11) you’ll see 2 modules named Testing_Code (guess what’s in there!), and FindMissingNumbers.
Testing_Code contains the MicroTimer function, and two subs called Using_Collection_Loops and Using_Find. You can run either of these subs to do your own testing.
The array lngNumValues is set to hold five values which are how many numbers in the sequence you test, i.e. 10, 100, 1000, 10 000 or 100 000. You can change these values to whatever you like.
Specifying which value in the array lngNumValues to use in testing determines how many numbers are tested. For example lngMax = lngNumValues(4) sets lngMax to 10 000, so the code will use a random sequence of 10 000 numbers.
When you run either of the testing subs, you’ll see the missing numbers displayed in the Immediate window, followed by a duration for the code execution.
How To Use the FindMissingNumbers Sub In Your Workbook
In the FindMissingNumbers module is the FindMissingNumbers sub.
On the first sheet of the workbook I’ve added a big green button, click this and the macro will run. You can also run the macro in a number of other ways. Read this article to learn how to run a macro using a shape, shortcut keys, an icon on your Ribbon, or the Quick Access Toolbar.
When the macro starts the first thing you need to do is specify the input range. You can either use your mouse to select the range, or type it in
Then you specify where you want the missing numbers to be listed. You can select a single cell and the results will be output into that column.
If you select more than one cell across a row, the results will be output across that row.
Here's the link to download the workbook with all the code. Or copy it from below.
Spread the Word
If you liked this or know someone who could use it please click the buttons below to share it with your friends on LinkedIn, Google+1, Facebook and Twitter.
Disclaimer – Please test the code yourself, it may not work in your environment. All code provided as is without any warranty
Option Explicit '********************************************************** ' Find missing integer numbers in a range ' Author : Philip Treacy, MyOnlineTrainingHub ' With thanks to Bryan upon whose code this is based ' For more info go to http://www.myonlinetraininghub.com/find-missing-numbers-really-fast '********************************************************** Sub FindMissingNumbers() Dim InputRange As Range, OutputRange As Range Dim LowerVal As Single, UpperVal As Single, count_i As Single, count_j As Single Dim NumRows As Long, NumColumns As Long Dim Horizontal As Boolean Dim rng As Range Dim col As Collection Dim varDummy As Variant 'Default is to output the results into a column Horizontal = False On Error GoTo ErrorHandler 'Ask for the range to check Set InputRange = Application.InputBox(Prompt:="Select a range to check :", _ Title:="Find missing values", _ Default:=Selection.Address, Type:=8) 'Find the lowest and highest values in the range/sequence LowerVal = WorksheetFunction.Min(InputRange) UpperVal = WorksheetFunction.Max(InputRange) 'Ask where the output is to go Set OutputRange = Application.InputBox(Prompt:="Select where you want the result to go :", _ Title:="Select cell for Results", _ Default:=Selection.Address, Type:=8) 'Check the number of rows and columns in the output range NumRows = OutputRange.Rows.Count NumColumns = OutputRange.Columns.Count 'If there are more columns selected than rows, output is to go horizontally If NumRows < NumColumns Then Horizontal = True 'Reset the number of rows to 1 so that output is into first row NumRows = 1 Else 'Reset the number of columns to 1 so that output is into first column NumColumns = 1 End If Set col = New Collection ' First add all items to collection (ignore errors when duplicates are added) For Each rng In InputRange On Error Resume Next col.Add rng.Value2, CStr(rng.Value2) On Error GoTo 0 Next rng count_j = 1 ' Loop through every possible nubmer in range (error returned when number not found) On Error Resume Next For count_i = LowerVal To UpperVal varDummy = col(CStr(count_i)) If Not Err.Number = 0 Then 'Output the missing number to the sheet If Horizontal Then OutputRange.Cells(NumRows, count_j).Value = count_i count_j = count_j + 1 Else OutputRange.Cells(count_j, NumColumns).Value = count_i count_j = count_j + 1 End If Err.Clear End If Next count_i On Error Resume Next Exit Sub ErrorHandler: If InputRange Is Nothing Then MsgBox "ERROR : No input range specified." Exit Sub End If If OutputRange Is Nothing Then MsgBox "ERROR : No output cell specified." Exit Sub End If MsgBox "An error has occurred. The macro will end." End Sub