May 10, 2022

Regular Expressions Demystified: A Practical Guide with Real Examples

Regular expressions (regex) are one of the most powerful tools in computing — and one of the most intimidating. But once you understand the basics, regex becomes indispensable for searching text, validating input, extracting data, and transforming strings.

This guide teaches regex through practical, real-world examples you can use immediately in the terminal, code editors, and programming languages.

What Are Regular Expressions?

A regular expression is a pattern that describes a set of strings. Instead of searching for an exact word, you describe what the text looks like. For example:

\d{3}-\d{4} matches phone numbers like 555-1234
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} matches email addresses
^https?:// matches URLs starting with http:// or https://

Regex is supported in virtually every programming language, text editor, and command-line tool.

Basic Syntax Reference

Literal Characters

Most characters match themselves:

Pattern	Matches
`hello`	The exact string “hello”
`abc123`	The exact string “abc123”

Metacharacters

These characters have special meaning:

Symbol	Meaning	Example	Matches
`.`	Any single character	`h.t`	hat, hit, hot, h9t
`^`	Start of line	`^Hello`	Lines starting with “Hello”
`$`	End of line	`world$`	Lines ending with “world”
`*`	Zero or more of previous	`ab*c`	ac, abc, abbc, abbbc
`+`	One or more of previous	`ab+c`	abc, abbc, abbbc (not ac)
`?`	Zero or one of previous	`colou?r`	color, colour
`\`	Escape a metacharacter	`\.`	A literal dot
`	`	OR operator	`cat

Character Classes

Pattern	Meaning
`[abc]`	Any one of a, b, or c
`[a-z]`	Any lowercase letter
`[A-Z]`	Any uppercase letter
`[0-9]`	Any digit
`[a-zA-Z0-9]`	Any alphanumeric character
`[^abc]`	Any character except a, b, or c

Shorthand Character Classes

Shorthand	Equivalent	Meaning
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Any word character
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\s`	`[ \t\n\r]`	Any whitespace
`\S`	`[^ \t\n\r]`	Any non-whitespace
`\b`		Word boundary

Quantifiers

Quantifier	Meaning
`{3}`	Exactly 3 times
`{2,5}`	Between 2 and 5 times
`{3,}`	3 or more times
`*`	0 or more (same as `{0,}`)
`+`	1 or more (same as `{1,}`)
`?`	0 or 1 (same as `{0,1}`)

Groups and Capturing

Pattern	Meaning
`(abc)`	Capture group — matches “abc” and remembers it
`(?:abc)`	Non-capturing group — matches but doesn’t remember
`\1`	Backreference to first capture group
`(?=abc)`	Lookahead — matches if followed by “abc”
`(?!abc)`	Negative lookahead — matches if NOT followed by “abc”

Practical Examples

1. Validate an Email Address

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breaking it down:

^ — Start of string
[a-zA-Z0-9._%+-]+ — One or more valid characters before @
@ — Literal @ sign
[a-zA-Z0-9.-]+ — Domain name
\. — Literal dot
[a-zA-Z]{2,} — TLD (at least 2 letters)
$ — End of string

2. Match an IP Address

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

This matches patterns like 192.168.1.1 or 10.0.0.255. Note that this doesn’t validate the range (0-255) — it just matches the format.

3. Extract URLs from Text

https?://[^\s<>"']+

https? — “http” or “https”
:// — Literal characters
[^\s<>"']+ — One or more characters that aren’t whitespace or quotes

4. Match a Date (YYYY-MM-DD)

\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

\d{4} — Four-digit year
0[1-9]|1[0-2] — Month 01-12
0[1-9]|[12]\d|3[01] — Day 01-31

5. Validate a Strong Password

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

Requires at least: one lowercase, one uppercase, one digit, one special character, minimum 8 characters total. Uses lookaheads (?=) to check each condition independently.

Using Regex in the Terminal

grep — Search Files

# Find lines containing an email address
grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

# Find lines starting with "Error" (case insensitive)
grep -iE '^error' /var/log/syslog

# Find IP addresses in a log file
grep -oE '\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b' access.log

# Count occurrences
grep -cE 'pattern' file.txt

# Search recursively in a directory
grep -rE 'TODO|FIXME|HACK' ./src/

The -E flag enables extended regex (ERE). Use -P for Perl-compatible regex (PCRE) with \d, \w, etc.

sed — Find and Replace

# Replace first occurrence on each line
sed 's/old/new/' file.txt

# Replace ALL occurrences (global)
sed 's/old/new/g' file.txt

# Replace in-place (modify the file)
sed -i 's/old/new/g' file.txt

# Remove blank lines
sed '/^$/d' file.txt

# Remove HTML tags
sed 's/<[^>]*>//g' page.html

# Extract content between quotes
sed -n 's/.*"\([^"]*\)".*/\1/p' file.txt

# Replace multiple spaces with a single space
sed 's/  */ /g' file.txt

awk — Pattern-Based Processing

# Print lines matching a pattern
awk '/^ERROR/' logfile.txt

# Print specific fields from matching lines
awk '/404/ {print $1, $7}' access.log

# Sum numbers in a column
awk '{sum += $3} END {print sum}' data.txt

Using Regex in Programming Languages

Python

import re

text = "Contact us at support@example.com or sales@company.org"

# Find all email addresses
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print(emails)  # ['support@example.com', 'sales@company.org']

# Search for a pattern
match = re.search(r'\d{3}-\d{4}', 'Call 555-1234 today')
if match:
    print(match.group())  # 555-1234

# Replace
result = re.sub(r'\bJohn\b', 'Jane', 'John went to John\'s house')
print(result)  # Jane went to Jane's house

# Split by regex
parts = re.split(r'[,;\s]+', 'one, two; three four')
print(parts)  # ['one', 'two', 'three', 'four']

# Compile for reuse
pattern = re.compile(r'^(\d{4})-(\d{2})-(\d{2})$')
match = pattern.match('2024-01-15')
if match:
    year, month, day = match.groups()

JavaScript

const text = "Contact us at support@example.com or sales@company.org";

// Find all matches
const emails = text.match(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g);
console.log(emails); // ['support@example.com', 'sales@company.org']

// Test if a string matches
const isEmail = /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test('user@example.com');
console.log(isEmail); // true

// Replace
const result = 'Hello World'.replace(/world/i, 'Regex');
console.log(result); // Hello Regex

// Named capture groups
const dateMatch = '2024-01-15'.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
console.log(dateMatch.groups.year);  // 2024
console.log(dateMatch.groups.month); // 01

Regex in VS Code

VS Code has powerful regex find-and-replace. Press Ctrl+H (or Cmd+H on Mac) and click the .* icon to enable regex mode.

Common tasks:

# Find all console.log statements
console\.log\(.*?\)

# Find empty HTML tags
<(\w+)>\s*</\1>

# Replace tabs with 2 spaces
Find: \t
Replace: (two spaces)

# Add semicolons to lines that don't have them
Find: ([^;{}\s])\s*$
Replace: $1;

# Convert single quotes to double quotes
Find: '([^']*)'
Replace: "$1"

Common Mistakes and Tips

Greedy vs. Lazy Matching

By default, quantifiers are greedy — they match as much as possible:

# Greedy: matches everything between the FIRST < and LAST >
<.*>    on "<b>bold</b>" matches "<b>bold</b>"

# Lazy: matches as little as possible (add ?)
<.*?>   on "<b>bold</b>" matches "<b>" then "</b>"

Escaping Special Characters

Remember to escape these when you want the literal character: . * + ? ^ $ { } [ ] ( ) | \

# Match a literal dot
\.

# Match a literal dollar sign
\$

# Match a literal backslash
\\

Anchors Matter

Without anchors, regex matches anywhere in the string:

# Without anchors — matches "cat" inside "concatenate"
cat

# With word boundaries — matches only the word "cat"
\bcat\b

# With line anchors — matches only lines that ARE "cat"
^cat$

Testing and Debugging Regex

Use these tools to build and test your patterns:

regex101.com — https://regex101.com — The best online regex tester. Shows matches, capture groups, and explains each part of your pattern.
regexr.com — https://regexr.com — Interactive regex tester with a cheat sheet.
debuggex.com — https://debuggex.com — Visualizes regex as railroad diagrams.

Quick Reference Cheat Sheet

.        Any character
\d       Digit [0-9]
\w       Word char [a-zA-Z0-9_]
\s       Whitespace
\b       Word boundary
^        Start of line
$        End of line
[abc]    Character class
[^abc]   Negated class
a|b      a OR b
(...)    Capture group
(?:...)  Non-capture group
*        0 or more
+        1 or more
?        0 or 1
{n}      Exactly n
{n,m}    Between n and m

Regex takes practice. Start with simple patterns, test them on regex101.com, and gradually build complexity. Once it clicks, you’ll use regex everywhere — in code, in the terminal, in your editor, and in data processing pipelines.