Regular Expressions (RegEx)

Web analytics tools use regular expressions in filters, goals, searches, and more. This article is a basic refresher.

Please use our free regex tester to test your own regular expressions.

What are Regular Expressions?

Regular expressions (also known as regex) are used to find specific patterns in a list. Regex can be used to find anything that matches a certain pattern. For example, you can find all keywords that start with the phrase “replace”, all pages within a subdirectory, or all pages with a query string more than ten characters long.

Regular expressions provide a powerful and flexible way to describe what the pattern should look like, using a combination of letters, numbers, and special characters.

For example, typing html into the search box in the content reports will return all URLs that contain “html” anywhere in path. For example, the following pages would be returned:

  • /index.html
  • /html-definitions.php
  • /search.php?q=html+vs+php

The Escape Character: Backslash

Regular expressions use a series of special characters that carry specific meanings. This is a thorough, but not complete, list of the special characters in regex that carry a non-literal meaning.

^ $ . ? [] () + \

As an example, the question mark means “make the previous character optional” in regex. We’ll show an example of this in action later in this article.

But if you want to search a question mark, you need to “escape” the regex interpretation of the question mark. You accomplish this by putting a backslash just before the quesetion mark, like this:

\?

If you want to match the period character, escape it by adding a backslash before it. For example, \.html would match a dot followed by the string “html”.

If you want to match a series of special characters in a row, just escape each one individually. To match “$?”, you would type \$\?.

You can escape any special character with a backslash – even the backslash! \\

If you’re unsure whether a character is a special character or not, you can escape it without any negative consequences.

Anchors: Caret and Dollar

Regular expressions match the pattern you specify if they occur anywhere in the string–beginning, middle or end. There are anchors you can use in regex to specify that a pattern should only occur at the beginning or end. The anchor characters are:

^ $

Use the caret symbol (^) to anchor a pattern to the beginning. Use a dollar sign ($) to anchor a pattern to the end. You can use either or both in a

^/page will match “/pages.html”, “/page/site.php” and “/page”. It won’t match “/site/page” or “/pag/es.html”.

html$ will match “/index.html”, “/content/site.html” and “/html”, but not “/html/page.php”, “/index.htm” or “/index.html?q=html+vs+php”.

^car$ will only match “car” and ^$ will match only empty strings.

$/google.php^ won’t match anything because it’s bad regex – the caret should always be to the left of the dollar: ^/google.php$

Ranges of Characters

Regex can also be used to match ranges or combinations of characters. Square brackets allow you to specify a variety of characters that can appear in a certain position in the string.

For example, [eio] would match either “e”, “i” or “o”.

You can include a long list of characters in square brackets, but it’s easier to match a range of characters with a hyphen. For example:

[a-z] will match any lowercase letter from a to z.

[a-zA-Z0-9] will match any lowercase letter, uppercase letter, or number.

[a-dX-Z] will match a, b, c, d, X, Y, or Z.

Square brackets look at each individual character, not whole words.

[word] matches a single occurrence of “w”, “o”, “r” or “d”.

To match a string of characters, enclose them in parentheses and use a pipe (|) as an “or” character. For example, to match an instance of “cat” or “dog”, you would type:

(cat)|(dog) OR (cat|dog).

Finally, use a period to match any character. It’s like a wildcard for a single character:

car.s will match “carrs”, “car?s”, “car5s”, etc.

Repeating Patterns

With regex, you can even specify the number of times a pattern should occur.

A question mark after a character will match zero or one occurrence of the character. This makes the character optional:

aa?pple will match “aapple” or “apple”.

A plus sign matches one or more occurrences.

a+ will match “a”, “aa”, “aaaaaaaaaa”, etc.

An asterisk matches zero or more of the previous character. Combined with a period, “.*” is commonly used as a wildcard because it matches anything.

.* will match any non-empty strings.

Curly brackets allow you to match a specific range of occurrences. You specify the minimum and maximum number of occurrences.

ca{3,5}t will match “caaat”, “caaaat”, “caaaaat”, but not “cat” or “caaaaaaaaat”.

Next Steps

Use our free regex tester to test your own regular expressions.