Online Python Regex Playground

Regular Expressions in Python

Regular expressions are an indispensable tool for text processing and data manipulation. They allow you to search, match, and manipulate text based on specific patterns.

We will cover the basics and some advanced features of regular expressions in Python.

Basic Usage

The re module provides support for regular expressions, making it straightforward to incorporate regex functionality into your Python scripts.

Simple Matching

Let's first look at a simple example of matching patterns in a string using the re.match function.

import re

pattern = r'\d+'  # Matches one or more digits
string = "There are 123 apples"

match = re.match(pattern, string)
if match:
    print("Match found:", match.group())
else:
    print("No match found")

Searching and Finding Patterns

The re.search function helps you find the first location where the regex pattern matches in the string, while re.findall returns all matches.

import re

pattern = r'\d+'  # Matches one or more digits
string = "There are 123 apples and 456 oranges"

# Search for the first match
search_result = re.search(pattern, string)
if search_result:
    print("Search result:", search_result.group())

# Find all matches
findall_result = re.findall(pattern, string)
print("Find all result:", findall_result)

Replacing Patterns

The re.sub function allows you to replace matched patterns with a specified string.

import re

pattern = r'\d+'  # Matches one or more digits
string = "There are 123 apples and 456 oranges"
replacement = "X"

# Replace all matches with 'X'
result = re.sub(pattern, replacement, string)
print("Substitution result:", result)

In regular expressions, grouping allows you to capture specific parts of a match using parentheses (). This makes it possible to extract and reuse portions of the matched pattern, which is useful for extracting data from structured text.

import re

pattern = r'(\d+)\s(\w+)'  # Matches "number word" pairs
string = "123 apples"

match = re.search(pattern, string)
if match:
    print("Full match:", match.group(0))
    print("First group:", match.group(1))
    print("Second group:", match.group(2))

Let's break down the above code example:

pattern = r'(\d+)\s(\w+)':
- (\d+): The first group, (\d+), captures one or more digits (\d+). This will match the numeric part of the string, "123".
- \s: This matches any whitespace character, ensuring that the number and the word are separated by space.
- (\w+): The second group, (\w+), captures one or more word characters (letters or digits). This will match the word part of the string, "apples".
string = "123 apples": This is the string being searched. It contains a number followed by a word, matching the pattern.
re.search(pattern, string): This function searches the string for the first match of the pattern. If a match is found, it returns a match object, otherwise, it returns None.
match.group(0): Returns the entire matched string. In this case, it returns "123 apples".
match.group(1): Returns the first captured group, which is the portion matched by the first set of parentheses (\d+). In this case, it returns "123".
match.group(2): Returns the second captured group, which is the portion matched by the second set of parentheses (\w+). In this case, it returns "apples".

This technique is commonly used in text processing tasks like:

Extracting key-value pairs from structured text
Parsing dates, times, or measurements
Extracting parts of a URL, email addresses, or file names

Lookahead and Lookbehind

Lookaheads and lookbehinds are part of zero-width assertions in regular expressions, meaning they match patterns based on the context around them without including the surrounding characters in the final match result. These assertions allow you to check for the presence or absence of a pattern before or after the part you want to match, without consuming those characters in the match itself.

Lookaheads and lookbehinds are especially useful when you need to ensure a match occurs in a specific context but don’t want to include that context in the final result.

Use Case

You can use lookaheads and lookbehinds in scenarios where you want to capture data with specific boundaries or conditions without including those boundaries in your results. For instance:

Extracting numbers following a specific symbol (e.g., prices after a dollar sign)
Finding text between certain markers while excluding the markers from the result
Matching words that are followed or preceded by certain words, but not capturing the additional words

Types of Lookarounds:

Lookahead ((?=...)): Ensures that a certain pattern follows the current position but doesn't include it in the match.
Negative Lookahead ((?!...)): Ensures that a certain pattern does not follow the current position.
Lookbehind ((?<=...)): Ensures that a certain pattern precedes the current position but doesn't include it in the match.
Negative Lookbehind ((?<!...)): Ensures that a certain pattern does not precede the current position.

Example of Lookbehind

In the example below, we use a lookbehind to match digits that are preceded by a dollar sign ($), but without including the dollar sign in the result.

import re

pattern = r'(?<=\$)\d+'  # Matches digits preceded by a dollar sign
string = "The price is $123"

search_result = re.search(pattern, string)
if search_result:
    print("Lookbehind result:", search_result.group())

Explanation:

pattern = r'(?<=\$)\d+':
- (?<=...): This is a positive lookbehind assertion. It asserts that what immediately precedes the current position in the string is a dollar sign (\$), but the dollar sign itself will not be part of the match result.
- \d+: Matches one or more digits. These digits are the part of the pattern we want to capture and return.
string = "The price is $123": This is the input string, which contains a dollar sign followed by a number. The goal is to match the number (123) that comes after the dollar sign.
re.search(pattern, string): This searches the string for a match based on the pattern. In this case, it will look for digits (\d+) that are preceded by a dollar sign ($), but the dollar sign will not be included in the match result.
search_result.group(): If a match is found, this will return the captured part of the string that satisfies the lookbehind condition. In this case, it will return "123".

Additional Examples

Lookahead Example: Matching a word followed by a specific word:
pattern = r'\w+(?=\sis)' # Matches any word that is followed by " is" string = "This is a test." search_result = re.search(pattern, string) if search_result: print("Lookahead result:", search_result.group())
- Explanation: The lookahead (?=\sis) asserts that the matched word (\w+) must be followed by the phrase " is", but " is" is not included in the match.
- Output: This

Negative Lookbehind Example: Matching digits not preceded by a dollar sign:

pattern = r'(?<!\$)\d+'  # Matches digits not preceded by a dollar sign
string = "Price: $123 or 456"

search_result = re.search(pattern, string)
if search_result:
    print("Negative lookbehind result:", search_result.group())

Explanation: The negative lookbehind (?<!\$) ensures that the digits are not preceded by a dollar sign, so it will match "456" and not "123".
Output: 456

Combining Lookahead and Lookbehind:

pattern = r'(?<=\$)\d+(?=\sUSD)'  # Matches digits preceded by $ and followed by " USD"
string = "The price is $123 USD."

search_result = re.search(pattern, string)
if search_result:
    print("Combined lookahead/lookbehind result:", search_result.group())

Explanation: The lookbehind (?<=\$) ensures the digits are preceded by $, and the lookahead (?=\sUSD) ensures the digits are followed by " USD". The final match will only include the digits.
Output: 123

Using Flags

Flags modify the behavior of the regex. Common flags include re.IGNORECASE, re.MULTILINE, and re.DOTALL.

import re

pattern = r'apple'
string = "APPLE pie"

# Case-insensitive search
search_result = re.search(pattern, string, re.IGNORECASE)
if search_result:
    print("Case-insensitive search result:", search_result.group())

Compiling Regular Expressions

For better performance, especially if the same pattern is used multiple times, you can compile the regex.

import re

pattern = r'\d+'
compiled_pattern = re.compile(pattern)

string1 = "123 apples"
string2 = "456 oranges"

# Use the compiled pattern
match1 = compiled_pattern.search(string1)
match2 = compiled_pattern.search(string2)

if match1:
    print("Compiled search result 1:", match1.group())
if match2:
    print("Compiled search result 2:", match2.group())

Conclusion

Regular expressions offer a wide range of features and syntax for pattern matching, including character classes, quantifiers, anchors, and more. You can refer to the Python documentation for more information on regular expressions and their syntax.

Online Python Regex Playground

Enter a regular expression pattern and a string to test it against

Python Regex Cheat Sheet

Basic Patterns

Quantifiers

Character Classes

Grouping and Capturing

Anchors

Special Characters