RegEx (Regular Expressions) in Excel

Philip Treacy

February 22, 2019

RegEx stands for regular expression and is a sequence of characters that defines a pattern that can be used to match against strings.

Or, in simpler language, I can create a regex pattern and use it to search a string for that pattern. I can also use the regex pattern to validate input.

A simple pattern looks like this

[a-z]

and I can use that pattern to look for any lower case letter from a to z.

You can use VBA string functions and the LIKE operator, and you can use functions like MID, LEFT, RIGHT, in the worksheet. But none of these match RegEx for the complexity of pattern matching and data validation it provides.

Once you've learned RegEx you can use it with a huge range of programming languages, not just in Excel.

RegEx Patterns

RegEx patterns can be simple, or very complicated, but all are built using the following syntax.

Matching Characters

Pattern Description Example Matches
. Any single character c.t cat, cot, cut, c9t, c@t
[characters] Matches any single character between brackets [ ] [at] Matches only a or t
[^characters] Matches any single character not between brackets [ ] [^at] Matches any single character except a and t
[start–end] Matches any character in range specified between brackets [ ] [0-9] Any single digit between 0 and 9
[a-z] Any single lower case letter
[A-Z] Any single upper case letter
\w Alphanumeric characters and the underscore. The space character is not included in this. s0 _d,f.k s 0 _ d f k
\W Any non-alphanumeric characters or underscore s0 _d,f.k " " , .
\s Any white space character, spaces, tabs etc
\S Any non-white space character
\d Any single decimal digit d 0 and 1 in a0b1
\D Any single non-decimal digit D a and b in a0b1
\ This escapes special characters so you can search for them \. . in s0 _d,f.k
\t Tab (vbTab)
\r Carriage Return (vbCr)
\n New line (vbNewLine)

 

Quantifiers

Quantifiers allow you to modify the basic pattern to specify the number of times you want that pattern to match.

Quantifier Example Description Matches
* ab*c Matches zero or more occurrences of the preceding pattern ac
abc
abbbbbc
+ ab+c Matches one or more occurrences of the preceding pattern abc
abbbbbc
? ab?c Matches zero or one occurrences of the preceding pattern ac
abc
{n} a\d{3} Matches preceding pattern n times Letter a followed by any 3 numeric digits e.g. a000, a123 etc
{n,} a\d{2,} Matches preceding pattern at least n times Letter a followed by at least 2 numeric digits e.g. a00 a123 a00b a1234z
{n,m} a\d{1,3} Matches preceding pattern between n and m times Letter a followed by between 1 and 3 numeric digits e.g. a0 a123 a1234z

 

Capturing

We can specify that we want to capture a pattern, which means not only are we matching a pattern against a string, we're saying save that portion of the string that matches our pattern so we can do something with it.

Up to now we've only described matching a pattern against a string, capturing allows us to extract portions of the string. I'll get to examples of that later, first let's look at the syntax.

Syntax Example Description Matches
(pattern) [a-z](\d+) Matches a single letter followed by one or more numeric digits. The numbers are captured and can be output to the worksheet or otherwise used. If our string is 12b345j2r then we capture 345 and 2

 

Anchors

Anchor Example Description Matches
^ ^Once Matches any string that starts with "Once" Once upon a time
$ time$ Matches any string that ends with "time" Once upon a time

 

VBA RegEx Methods & Properties

To use RegEx in VBA you'll use the RegExp object. To use this you can either set a reference in the VBA editor, or use the CreateObject function.

vba reference to regex library

I prefer to set a reference as it means IntelliSense works for your new object.

intellisense for regex

Properties

The RegExp object has these properties:

  • Pattern - the pattern used for matching against a string
  • IgnoreCase - If True, the matching is case insensitive
  • Global - If True, find all matches of the pattern in the string. Otherwise, just the first match is found
  • MultiLine - If True, patterns are matched across line breaks in multi-line strings

Methods

The methods we can use are:

  • Test - check if the pattern can be found in the string. Returns True if it is found, False if it isn't.
  • Execute - returns occurrences of the pattern matched against the string
  • Replace - replaces occurrences of the pattern in the string

RegEx Functions

We must write our own functions to use the available RegEx methods.

Download the Example Workbook

These functions and lots of sample patterns are available to download.

Enter your email address below to download the sample workbook.

By submitting your email address you agree that we can email you our Excel newsletter.

Match

The RegExMatch function is the simplest of the three I have written.

regex match function

You pass in the string to be searched and the pattern. After setting the Pattern property you just call the Test method and return the result, either True or False.

Extract

The RegExExtract function takes 5 arguments:

  • Str - the string to be searched
  • Pattern - the regex pattern
  • MatchIndex - indicates which matched occurrence we want the function to return
  • SubMatchIndex - this is optional. Indicates which SubMatch we want the function to return, more on this below
  • IgnoreCase- this is optional. If True, indicates we want the matching to be case insensitive. Default is True

regex extract function

Matches and SubMatches

When the Execute method is executed, it can return multiple capturing groups, depending on what pattern is used. Let's look at an example.

If our string is She sells sea shells on the sea shore and our pattern is s(..)(..)(..) then we are trying to match the letter s followed by any 2 characters, another 2 characters, and another 2. Note that this includes spaces.

The parenthesis, (..), indicate that we want to capture these characters into separate groups.

When I execute the pattern s(..)(..)(..) against She sells sea shells on the sea shore I'm storing the result in an object I've called Matches.

This object has a property SubMatches where all the other captured groups are stored.

regex match and submatch

To access the matches and submatches we use numbers that I'm calling MatchIndex and SubmatchIndex.

regex match and submatch index

Something a little quirky here is that Matches are indexed from 0, which isn't unusual in programming. But SubMatches are indexed from 1.

So if we look at what matches our pattern

table of regex matches

and I want to access the 2nd submatch of the 3rd match, I would use Matches(2).SubMatches(2) and this would give me th.

If I just want the 1st match he, I'd use Matches(0).SubMatches(0)

Matches(1).SubMatches(0) gives me s - this includes the space before the s.

Matches(3).SubMatches(2) gives me ho.

Remember that our pattern specifies we are looking to match the letter s followed by 3 sets of 2 characters, but we are only capturing the 3 sets of 2 characters. We're not capturing the s

Consuming the String

As each match is made, RegEx works its way along the string looking for the next match.

What's important to understand is that every time a match is made, the portion of the string that matches is 'consumed' and is then not used to look for the next match.

If we look at our string She sells sea shells on the sea shore the first match is She sel so you might think that the next match is sells s but it's not.

regex matches in string

In matching She sel this portion of the string isn't used when looking for the next match. Effectively the next match is looked for in ls sea shells on the sea shore which means the next match is actually s sea s.

Replace

The RegExReplace function takes four arguments:

  • Str - the string to be searched
  • Pattern - the regex pattern
  • ReplaceWith - the string to replace any matches
  • ReplaceAll - this is optional. If True it indicates that you want to replace all matches. Default is False.

regex replace function

Practical Uses for RegEx

Now that you know how to use Regex, let put it to some use.

Data Validation

A very common use is to validate data input/format. You can use the RegExMatch function for this, here are some examples.

Pattern Description Matches
^[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}$ A V4 IP address 1.2.3.4
10.55.55.90
192.168.0.254
^(\+61) ([1-9]) ([0-9]{4}) ([0-9]{4})$ Australian land line including international dialling code +61 7 1111 2222
^3[47][0-9]{13}$ American Express Credit Card. Amex cards begin with either 34 or 37 and have 15 digits 341111111111111
371111111111111

 

Replacing Parts of Strings

If the string in cell A1 is ABC-123-XYZ-45-A-6789 I can use RegExReplace to:

Description Function Call Result
Replace a sequence of numbers with 000 RegExReplace(A1,"\d+","000",TRUE) ABC-000-XYZ-000-A-000
Remove non-alpha characters RegExReplace(A1,"-\d+-*","",TRUE) ABCXYZA
Remove non-numeric RegExReplace(A1,"-*[a-zA-Z]*-+","",TRUE) 123456789

 

Extracting Strings

If the string in cell A1 is IV:INV-32278ABC Widgets PTY LTD I can use RegExExtract to find matches and submatches which enables me to:

Description Function Call Result
Extract the invoice number RegExExtract(A1,"(.*[0-9]+)(.*)",0) IV:INV-32278
Extract the company name RegExExtract(A1,"(.*[0-9]+)(.*)",0,1) ABC Widgets PTY LTD

 

AUTHOR Philip Treacy Co-Founder / Owner at My Online Training Hub

Systems Engineer with 30+ years working for companies like Credit Suisse and E.D.S. in roles as varied as Network & Server Support, Team Leader and Consultant Project Manager.

These days Philip does a lot of programming in VBA for Excel, as well as PHP, JavaScript and HTML/CSS for web development.

He's particularly keen on Power Query where he writes a lot of M code.

When not writing blog posts or programming for My Online Training Hub, Philip can be found answering questions on the Microsoft Power BI Community forums where he is a Super User.

19 thoughts on “RegEx (Regular Expressions) in Excel”

  1. Hi,

    It seems that SubMatches are indexed from 0 (not from 1 as indeicated. I entered codes as in List 1 below and got results as in Table 1 below.

    List 1:

    Sub Test()
    Dim text As String
    Dim regEx0 As New RegExp
    Dim Matches As Object
    text = “She sells sea shells on the sea shore”
    regEx0.Pattern = “s(..)(..)(..)”
    regEx0.IgnoreCase = True
    regEx0.Global = True
    Set Matches = regEx0.Execute(text)
    For I = 0 To Matches.Count – 1
    Cells(I + 2, 2).Value = Matches(I)
    For J = 0 To Matches(I).SubMatches.Count – 1
    Cells(I + 2, J + 3).Value = Matches(I).SubMatches(J)
    Next J
    Next I

    Table 1

    Matches(0).Value SubMatches(0) SubMatches(1) SubMatches(2)
    Matches(0) She sel he s el
    Matches(1) s sea s s ea s
    Matches(2) s on th o n th
    Matches(3) sea sho ea s ho

    Reply
  2. Can anyone please tell me how to regex pattern from this description

    V000306 Mark Roney ; Reversed — Computer Expenses:Computer and software – Recurring ; Reversal Expensify Accrual -Jan-2021 ; Raintank, Inc. ; D2220 US East Coast

    I need to select only one word from this description
    How to create formula

    Reply
  3. Could someone tell me how to fix the below syntax error? Thank you!

    =IF(REGEXMATCH(AL2,”(?i)SATA”), “SATA”, IF(REGEXMATCH(AL2,”(?i)NVMe”), “NVMe”, IF(REGEXMATCH(AL2,”(?i)EDSFF”), “NVMe”, IF(REGEXMATCH(AL2,”(?i)SAS”), “SAS”, “–“))))

    Reply
    • Hi Amy,

      The only thing I can see wrong is that the double quotes aren’t correct.

      Your code is using curly quotes like this “

      But VBA requires ‘straight’ double quotes for encasing strings, like this

      "

      So try replacing all the double quotes with straight ones, like this

      =IF(RegexMatch(AL2,"(?i)SATA"), "SATA", IF(RegexMatch(AL2,"(?i)NVMe"), "NVMe", IF(RegexMatch(AL2,"(?i)EDSFF"), "NVMe", IF(RegexMatch(AL2,"(?i)SAS"), "SAS", "–"))))

      Regards

      Phil

      Reply
  4. Hi
    The VBA USERFORM color format is like &H(80 or 00)BBGGRR&, and BBGGRR can be any number or letter, including 8 and 0, putting this, how can i extract the sub-string BBGGRR from that string?

    Thanks

    Reply
    • Hi Carlos,
      Use conditional declarations, this way will wok on both 32-bit and 64-bit platforms:
      #If VBA7 Then
      Declare PtrSafe Sub… (use 64 bit declarations)
      #Else
      Declare Sub… (use 32 bit declarations)
      #EndIf

      Reply
  5. Thank you very much! This helps a lot for using RegEx in Excel. But two things dive me really nuts because I can’t find a solution.

    1. If I look into my add Ins in Excel I do not find your package. I don’t find the functions and properties and I do not know how to transfer them into other Excel documents.
    2. The evaluation seems to be case sensitive but I need it non case sensitive. Due to not seeing the properties of the regex method I can’t change this.

    Currently I’m using Office2013 and there is no VBA installed. Nevertheless your functions work properly, but only in the example workbook.

    L.E.:
    Sorry, now I found it, I needed to activate the developper toos in the menu bar. There I can find and edit the macros and properties.

    Reply
  6. Thanks for an interesting and helpful article.

    In your first table (Matching Characters), I think there is a typo: vbTab should be \t, while \r is vbCR (which isn’t listed).

    Reply

Leave a Comment

Current ye@r *