Diving Deep into Regex: A Visual Guide

Table of Contents

What is Regex?

Regular Expressions, often shortened to “regex” or “regexp,” are powerful tools used to find and manipulate text strings based on specific patterns. Imagine them as sophisticated search filters on steroids. Instead of simply searching for a single word, regex allows you to define complex rules to match intricate combinations of characters. If you are running a scan with the Captain Compliance Cookie Scanner than understanding Regex may come in handy to understand the definitions of 1st party and 3rd party cookies.

Think of it this way:

  • Simple Search: Finding the word “cat” in a document.
  • Regex Search: Finding all words that start with “c” and end with “t” (e.g., “cat,” “cut,” “cot”).

Key Concepts:

  • Metacharacters: Special characters that have a specific meaning within a regex pattern.
    • . (Period): Matches any single character except a newline.
    • * (Asterisk): Matches the preceding character zero or more times.
    • + (Plus): Matches the preceding character one or more times.
    • ? (Question Mark): Matches the preceding character zero or one time (optional).
    • [] (Square Brackets): Defines a character set. For example, [a-z] matches any lowercase letter.
    • () (Parentheses): Creates a capturing group, allowing you to extract specific parts of the matched text.
    • \d: Matches any digit (0-9).
    • \w: Matches any word character (letters, digits, and underscores).
    • \s: Matches any whitespace character (spaces, tabs, newlines).

Example of a cookie scanners results for 1st party cookies using regex

Regex in Action: A Visual Example

Let’s say you want to find all email addresses within a block of text.

  1. Identify the Pattern:

    • Email addresses typically follow a pattern:
      • Username (can contain letters, digits, and special characters)
      • @ symbol
      • Domain name (can contain letters, digits, and periods)
      • Top-level domain (e.g., “.com,” “.org,” “.net”)
  2. Construct the Regex:

    Code snippet

    \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
    
    • \b: Matches a word boundary.
    • [A-Za-z0-9._%+-]+: Matches one or more characters from the specified set.
    • @: Matches the literal “@” symbol.
    • [A-Za-z0-9.-]+: Matches one or more characters from the specified set.
    • \.: Matches a literal period.
    • [A-Z|a-z]{2,}: Matches two or more uppercase or lowercase letters.
    • \b: Matches a word boundary.

Regex and Cookies

Cookies are small pieces of data stored on a user’s computer by a website. They often contain information like user preferences, login sessions, and tracking data.

Regex Meaning Visual Display

Regex can be used to:

  • Extract specific cookie values:

    • Let’s say a cookie is named “user_id” and its value is “12345”.
    • Regex can be used to extract the “12345” part from the cookie string.
  • Identify and categorize cookies:

    • By analyzing the cookie name or value, regex can determine the type of cookie (e.g., session cookie, persistent cookie, third-party cookie).
  • Clean cookie data:

    • Remove any unwanted characters or formatting from cookie values.

Example: Reading a Cookie with Regex

Let’s assume a cookie string looks like this: user_id=12345; session_token=abcdef; language=en

To extract the user ID:

  1. Regex pattern: user_id=([0-9]+)
  2. Explanation:
    • user_id=: Matches the literal string “user_id=”.
    • ([0-9]+): Captures one or more digits within a capturing group.

This regex will successfully extract the “12345” value.

Applications for Regex

Regex is a powerful and versatile tool with numerous applications. By understanding the core concepts and practicing with different patterns, you can effectively use regex to:

  • Extract information from text: Find specific data within large datasets.
  • Validate data: Ensure data meets specific criteria (e.g., email addresses, phone numbers).
  • Transform data: Modify text strings in various ways.

Note: This is a simplified explanation. Regex can become quite complex, with many advanced features and nuances. If you’re an engineer this reads easily to you. If you’re just learning this is a great starting point. If you’re working with a cookie consent management platform and are interested in Captain Compliance please book a demo below.

Written by: 

Richart Ruddie

Online Privacy Compliance Made Easy

Captain Compliance makes it easy to develop, oversee, and expand your privacy program. Book a demo or start a trial now.