RegEx stands for regular expression and is a sequence of characters that defines a pattern that can be used to match against strings.
Or, in simpler language, I can create a regex pattern and use it to search a string for that pattern. I can also use the regex pattern to validate input.
A simple pattern looks like this
[a-z]
and I can use that pattern to look for any lower case letter from a to z.
You can use VBA string functions and the LIKE operator, and you can use functions like MID, LEFT, RIGHT, in the worksheet. But none of these match RegEx for the complexity of pattern matching and data validation it provides.
Once you've learned RegEx you can use it with a huge range of programming languages, not just in Excel.
RegEx Patterns
RegEx patterns can be simple, or very complicated, but all are built using the following syntax.
Matching Characters
Pattern | Description | Example | Matches |
---|---|---|---|
. | Any single character | c.t | cat, cot, cut, c9t, c@t |
[characters] | Matches any single character between brackets [ ] | [at] | Matches only a or t |
[^characters] | Matches any single character not between brackets [ ] | [^at] | Matches any single character except a and t |
[start–end] | Matches any character in range specified between brackets [ ] | [0-9] | Any single digit between 0 and 9 |
[a-z] | Any single lower case letter | ||
[A-Z] | Any single upper case letter | ||
\w | Alphanumeric characters and the underscore. The space character is not included in this. | s0 _d,f.k | s 0 _ d f k |
\W | Any non-alphanumeric characters or underscore | s0 _d,f.k | " " , . |
\s | Any white space character, spaces, tabs etc | ||
\S | Any non-white space character | ||
\d | Any single decimal digit | d | 0 and 1 in a0b1 |
\D | Any single non-decimal digit | D | a and b in a0b1 |
\ | This escapes special characters so you can search for them | \. | . in s0 _d,f.k |
\t | Tab (vbTab) | ||
\r | Carriage Return (vbCr) | ||
\n | New line (vbNewLine) |
Quantifiers
Quantifiers allow you to modify the basic pattern to specify the number of times you want that pattern to match.
Quantifier | Example | Description | Matches |
---|---|---|---|
* | ab*c | Matches zero or more occurrences of the preceding pattern | ac |
abc | |||
abbbbbc | |||
+ | ab+c | Matches one or more occurrences of the preceding pattern | abc |
abbbbbc | |||
? | ab?c | Matches zero or one occurrences of the preceding pattern | ac |
abc | |||
{n} | a\d{3} | Matches preceding pattern n times | Letter a followed by any 3 numeric digits e.g. a000, a123 etc |
{n,} | a\d{2,} | Matches preceding pattern at least n times | Letter a followed by at least 2 numeric digits e.g. a00 a123 a00b a1234z |
{n,m} | a\d{1,3} | Matches preceding pattern between n and m times | Letter a followed by between 1 and 3 numeric digits e.g. a0 a123 a1234z |
Capturing
We can specify that we want to capture a pattern, which means not only are we matching a pattern against a string, we're saying save that portion of the string that matches our pattern so we can do something with it.
Up to now we've only described matching a pattern against a string, capturing allows us to extract portions of the string. I'll get to examples of that later, first let's look at the syntax.
Syntax | Example | Description | Matches |
---|---|---|---|
(pattern) | [a-z](\d+) | Matches a single letter followed by one or more numeric digits. The numbers are captured and can be output to the worksheet or otherwise used. | If our string is 12b345j2r then we capture 345 and 2 |
Anchors
Anchor | Example | Description | Matches |
---|---|---|---|
^ | ^Once | Matches any string that starts with "Once" | Once upon a time |
$ | time$ | Matches any string that ends with "time" | Once upon a time |
VBA RegEx Methods & Properties
To use RegEx in VBA you'll use the RegExp object. To use this you can either set a reference in the VBA editor, or use the CreateObject function.
I prefer to set a reference as it means IntelliSense works for your new object.
Properties
The RegExp object has these properties:
- Pattern - the pattern used for matching against a string
- IgnoreCase - If True, the matching is case insensitive
- Global - If True, find all matches of the pattern in the string. Otherwise, just the first match is found
- MultiLine - If True, patterns are matched across line breaks in multi-line strings
Methods
The methods we can use are:
- Test - check if the pattern can be found in the string. Returns True if it is found, False if it isn't.
- Execute - returns occurrences of the pattern matched against the string
- Replace - replaces occurrences of the pattern in the string
RegEx Functions
We must write our own functions to use the available RegEx methods.
Download the Example Workbook
These functions and lots of sample patterns are available to download.
Enter your email address below to download the sample workbook.
Match
The RegExMatch function is the simplest of the three I have written.
You pass in the string to be searched and the pattern. After setting the Pattern property you just call the Test method and return the result, either True or False.
Extract
The RegExExtract function takes 5 arguments:
- Str - the string to be searched
- Pattern - the regex pattern
- MatchIndex - indicates which matched occurrence we want the function to return
- SubMatchIndex - this is optional. Indicates which SubMatch we want the function to return, more on this below
- IgnoreCase- this is optional. If True, indicates we want the matching to be case insensitive. Default is True
Matches and SubMatches
When the Execute method is executed, it can return multiple capturing groups, depending on what pattern is used. Let's look at an example.
If our string is She sells sea shells on the sea shore and our pattern is s(..)(..)(..) then we are trying to match the letter s followed by any 2 characters, another 2 characters, and another 2. Note that this includes spaces.
The parenthesis, (..), indicate that we want to capture these characters into separate groups.
When I execute the pattern s(..)(..)(..) against She sells sea shells on the sea shore I'm storing the result in an object I've called Matches.
This object has a property SubMatches where all the other captured groups are stored.
To access the matches and submatches we use numbers that I'm calling MatchIndex and SubmatchIndex.
Something a little quirky here is that Matches are indexed from 0, which isn't unusual in programming. But SubMatches are indexed from 1.
So if we look at what matches our pattern
and I want to access the 2nd submatch of the 3rd match, I would use Matches(2).SubMatches(2) and this would give me th.
If I just want the 1st match he, I'd use Matches(0).SubMatches(0)
Matches(1).SubMatches(0) gives me s - this includes the space before the s.
Matches(3).SubMatches(2) gives me ho.
Remember that our pattern specifies we are looking to match the letter s followed by 3 sets of 2 characters, but we are only capturing the 3 sets of 2 characters. We're not capturing the s
Consuming the String
As each match is made, RegEx works its way along the string looking for the next match.
What's important to understand is that every time a match is made, the portion of the string that matches is 'consumed' and is then not used to look for the next match.
If we look at our string She sells sea shells on the sea shore the first match is She sel so you might think that the next match is sells s but it's not.
In matching She sel this portion of the string isn't used when looking for the next match. Effectively the next match is looked for in ls sea shells on the sea shore which means the next match is actually s sea s.
Replace
The RegExReplace function takes four arguments:
- Str - the string to be searched
- Pattern - the regex pattern
- ReplaceWith - the string to replace any matches
- ReplaceAll - this is optional. If True it indicates that you want to replace all matches. Default is False.
Practical Uses for RegEx
Now that you know how to use Regex, let put it to some use.
Data Validation
A very common use is to validate data input/format. You can use the RegExMatch function for this, here are some examples.
Pattern | Description | Matches |
---|---|---|
^[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}$ | A V4 IP address | 1.2.3.4 10.55.55.90 192.168.0.254 |
^(\+61) ([1-9]) ([0-9]{4}) ([0-9]{4})$ | Australian land line including international dialling code | +61 7 1111 2222 |
^3[47][0-9]{13}$ | American Express Credit Card. Amex cards begin with either 34 or 37 and have 15 digits | 341111111111111 371111111111111 |
Replacing Parts of Strings
If the string in cell A1 is ABC-123-XYZ-45-A-6789 I can use RegExReplace to:
Description | Function Call | Result |
---|---|---|
Replace a sequence of numbers with 000 | RegExReplace(A1,"\d+","000",TRUE) | ABC-000-XYZ-000-A-000 |
Remove non-alpha characters | RegExReplace(A1,"-\d+-*","",TRUE) | ABCXYZA |
Remove non-numeric | RegExReplace(A1,"-*[a-zA-Z]*-+","",TRUE) | 123456789 |
Extracting Strings
If the string in cell A1 is IV:INV-32278ABC Widgets PTY LTD I can use RegExExtract to find matches and submatches which enables me to:
Description | Function Call | Result |
---|---|---|
Extract the invoice number | RegExExtract(A1,"(.*[0-9]+)(.*)",0) | IV:INV-32278 |
Extract the company name | RegExExtract(A1,"(.*[0-9]+)(.*)",0,1) | ABC Widgets PTY LTD |
TW
Hi,
It seems that SubMatches are indexed from 0 (not from 1 as indeicated. I entered codes as in List 1 below and got results as in Table 1 below.
List 1:
Sub Test()
Dim text As String
Dim regEx0 As New RegExp
Dim Matches As Object
text = “She sells sea shells on the sea shore”
regEx0.Pattern = “s(..)(..)(..)”
regEx0.IgnoreCase = True
regEx0.Global = True
Set Matches = regEx0.Execute(text)
For I = 0 To Matches.Count – 1
Cells(I + 2, 2).Value = Matches(I)
For J = 0 To Matches(I).SubMatches.Count – 1
Cells(I + 2, J + 3).Value = Matches(I).SubMatches(J)
Next J
Next I
Table 1
Matches(0).Value SubMatches(0) SubMatches(1) SubMatches(2)
Matches(0) She sel he s el
Matches(1) s sea s s ea s
Matches(2) s on th o n th
Matches(3) sea sho ea s ho
Catalin Bombea
Hi TW,
Thank you for spotting that, it is 0 based indeed.
Cheers,
Catalin
Oz
Wonderful !{3}
Philip Treacy
Thanks 🙂
ProEdx
Can anyone please tell me how to regex pattern from this description
V000306 Mark Roney ; Reversed — Computer Expenses:Computer and software – Recurring ; Reversal Expensify Accrual -Jan-2021 ; Raintank, Inc. ; D2220 US East Coast
I need to select only one word from this description
How to create formula
Catalin Bombea
Hi ProEdx,
One word, but which one? can be any word from that string, or a specific word?
Amy
Could someone tell me how to fix the below syntax error? Thank you!
=IF(REGEXMATCH(AL2,”(?i)SATA”), “SATA”, IF(REGEXMATCH(AL2,”(?i)NVMe”), “NVMe”, IF(REGEXMATCH(AL2,”(?i)EDSFF”), “NVMe”, IF(REGEXMATCH(AL2,”(?i)SAS”), “SAS”, “–“))))
Philip Treacy
Hi Amy,
The only thing I can see wrong is that the double quotes aren’t correct.
Your code is using curly quotes like this “
But VBA requires ‘straight’ double quotes for encasing strings, like this
So try replacing all the double quotes with straight ones, like this
Regards
Phil
Jorge Cabral
Hi
The VBA USERFORM color format is like &H(80 or 00)BBGGRR&, and BBGGRR can be any number or letter, including 8 and 0, putting this, how can i extract the sub-string BBGGRR from that string?
Thanks
Catalin Bombea
use the Mid function in vba:
str=Mid(text,4,6)
Jorge Cabral
Hi, with Regular Expressions.
Catalin Bombea
You can try this pattern in A2:
(&H00|80)([0-9|A-Z]{1,6})
Formula:
=RegExExtract($A$1,$A$2,0,1) , in A1 you should have the Text.
Carlos
Hi, is it possible to update the funcions to work in 64 bits Office Packages?
Catalin Bombea
Hi Carlos,
Use conditional declarations, this way will wok on both 32-bit and 64-bit platforms:
#If VBA7 Then
Declare PtrSafe Sub… (use 64 bit declarations)
#Else
Declare Sub… (use 32 bit declarations)
#EndIf
Josef Wagner
Thank you very much! This helps a lot for using RegEx in Excel. But two things dive me really nuts because I can’t find a solution.
1. If I look into my add Ins in Excel I do not find your package. I don’t find the functions and properties and I do not know how to transfer them into other Excel documents.
2. The evaluation seems to be case sensitive but I need it non case sensitive. Due to not seeing the properties of the regex method I can’t change this.
Currently I’m using Office2013 and there is no VBA installed. Nevertheless your functions work properly, but only in the example workbook.
L.E.:
Sorry, now I found it, I needed to activate the developper toos in the menu bar. There I can find and edit the macros and properties.
Dang Duy Khanh
Thanks you so much!
Philip Treacy
You’re welcome.
David Magnuson
Thanks for an interesting and helpful article.
In your first table (Matching Characters), I think there is a typo: vbTab should be \t, while \r is vbCR (which isn’t listed).
Philip Treacy
Thanks David