Regular expressions for path filters

Important

Before applying the regular expressions, make sure you are using the correct URL.

A Regular Expression is a unique string that helps you describe a search pattern for your URL.

How to use regular expressions

Use regular expressions on path filters to fine-tune how Clarity selects pages on your site for analysis. The following filters accept regular expressions:

  • Entry URL (path filter)
  • Exit URL (path filter)
  • Visited URL (path filter)
  • Referring site (traffic filter)

Tip

Use regular expressions to group together collections of similar pages so that you can see aggregated results for all of them. Exclude pages from your search if they are outliers with data you don't want to analyze.

Here’s a simple example for a website with six pages:

  • example.com/home
  • example.com/about
  • example.com/contact
  • example.com/a/path
  • example.com/some/path/to/page
  • example.com/another/path/to/page

If you use the regular expression home in this example, your search would return only example.com/home. However, if you search using the regular expression path/.*/page then your search would match all pages that have "path" and "page" in that order (that is, example.com/some/path/to/page and example.com/another/path/to/page, but not example.com/a/path).

Rules

Clarity uses the regular expression syntax accepted by RE2, so you can make well-formed regular expressions using industry-standard syntax.

Here are some of the major rules used to process the expressions:

  • Except for metacharacters like * + ? ( ) |, characters match themselves.

  • Match a metacharacter by escaping it with a backslash. Example: \+ matches a literal plus character.

  • Two regular expressions can be alternated or concatenated to form a new regular expression. Example: if e1 matches s and e2 matches t, then e1 | e2 matches s or t, and e1e2 matches st.

  • The metacharacters *, +, and ? are repetition operators. Example: e1* matches a sequence of zero or more strings, each of which matches e1; e1+ matches one or more; e1? matches zero or one.

  • Operator precedence from weakest to strongest binding is alternation, concatenation, and finally, the repetition operators.

  • Parentheses can be used to force different meanings, as in arithmetic expressions.

More information

There are many more details about the RE2 standard, including different kinds of character codes. To learn more, check RE2 standards.

Visit Clarity