Regular Expressions Demystified: A Practical Guide with Real Examples
Regular expressions (regex) are one of the most powerful tools in computing — and one of the most intimidating. But once you understand the basics, regex becomes indispensable for searching text, validating input, extracting data, and transforming strings.
This guide teaches regex through practical, real-world examples you can use immediately in the terminal, code editors, and programming languages.
What Are Regular Expressions?
A regular expression is a pattern that describes a set of strings. Instead of searching for an exact word, you describe what the text looks like. For example:
\d{3}-\d{4}matches phone numbers like555-1234[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}matches email addresses^https?://matches URLs starting withhttp://orhttps://
Regex is supported in virtually every programming language, text editor, and command-line tool.
Basic Syntax Reference
Literal Characters
Most characters match themselves:
| Pattern | Matches |
|---|---|
hello | The exact string “hello” |
abc123 | The exact string “abc123” |
Metacharacters
These characters have special meaning:
| Symbol | Meaning | Example | Matches |
|---|---|---|---|
. | Any single character | h.t | hat, hit, hot, h9t |
^ | Start of line | ^Hello | Lines starting with “Hello” |
$ | End of line | world$ | Lines ending with “world” |
* | Zero or more of previous | ab*c | ac, abc, abbc, abbbc |
+ | One or more of previous | ab+c | abc, abbc, abbbc (not ac) |
? | Zero or one of previous | colou?r | color, colour |
\ | Escape a metacharacter | \. | A literal dot |
| ` | ` | OR operator | `cat |
Character Classes
| Pattern | Meaning |
|---|---|
[abc] | Any one of a, b, or c |
[a-z] | Any lowercase letter |
[A-Z] | Any uppercase letter |
[0-9] | Any digit |
[a-zA-Z0-9] | Any alphanumeric character |
[^abc] | Any character except a, b, or c |
Shorthand Character Classes
| Shorthand | Equivalent | Meaning |
|---|---|---|
\d | [0-9] | Any digit |
\D | [^0-9] | Any non-digit |
\w | [a-zA-Z0-9_] | Any word character |
\W | [^a-zA-Z0-9_] | Any non-word character |
\s | [ \t\n\r] | Any whitespace |
\S | [^ \t\n\r] | Any non-whitespace |
\b | Word boundary |
Quantifiers
| Quantifier | Meaning |
|---|---|
{3} | Exactly 3 times |
{2,5} | Between 2 and 5 times |
{3,} | 3 or more times |
* | 0 or more (same as {0,}) |
+ | 1 or more (same as {1,}) |
? | 0 or 1 (same as {0,1}) |
Groups and Capturing
| Pattern | Meaning |
|---|---|
(abc) | Capture group — matches “abc” and remembers it |
(?:abc) | Non-capturing group — matches but doesn’t remember |
\1 | Backreference to first capture group |
(?=abc) | Lookahead — matches if followed by “abc” |
(?!abc) | Negative lookahead — matches if NOT followed by “abc” |
Practical Examples
1. Validate an Email Address
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breaking it down:
^— Start of string[a-zA-Z0-9._%+-]+— One or more valid characters before @@— Literal @ sign[a-zA-Z0-9.-]+— Domain name\.— Literal dot[a-zA-Z]{2,}— TLD (at least 2 letters)$— End of string
2. Match an IP Address
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
This matches patterns like 192.168.1.1 or 10.0.0.255. Note that this doesn’t validate the range (0-255) — it just matches the format.
3. Extract URLs from Text
https?://[^\s<>"']+
https?— “http” or “https”://— Literal characters[^\s<>"']+— One or more characters that aren’t whitespace or quotes
4. Match a Date (YYYY-MM-DD)
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
\d{4}— Four-digit year0[1-9]|1[0-2]— Month 01-120[1-9]|[12]\d|3[01]— Day 01-31
5. Validate a Strong Password
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
Requires at least: one lowercase, one uppercase, one digit, one special character, minimum 8 characters total. Uses lookaheads (?=) to check each condition independently.
Using Regex in the Terminal
grep — Search Files
# Find lines containing an email address
grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt
# Find lines starting with "Error" (case insensitive)
grep -iE '^error' /var/log/syslog
# Find IP addresses in a log file
grep -oE '\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b' access.log
# Count occurrences
grep -cE 'pattern' file.txt
# Search recursively in a directory
grep -rE 'TODO|FIXME|HACK' ./src/
The -E flag enables extended regex (ERE). Use -P for Perl-compatible regex (PCRE) with \d, \w, etc.
sed — Find and Replace
# Replace first occurrence on each line
sed 's/old/new/' file.txt
# Replace ALL occurrences (global)
sed 's/old/new/g' file.txt
# Replace in-place (modify the file)
sed -i 's/old/new/g' file.txt
# Remove blank lines
sed '/^$/d' file.txt
# Remove HTML tags
sed 's/<[^>]*>//g' page.html
# Extract content between quotes
sed -n 's/.*"\([^"]*\)".*/\1/p' file.txt
# Replace multiple spaces with a single space
sed 's/ */ /g' file.txt
awk — Pattern-Based Processing
# Print lines matching a pattern
awk '/^ERROR/' logfile.txt
# Print specific fields from matching lines
awk '/404/ {print $1, $7}' access.log
# Sum numbers in a column
awk '{sum += $3} END {print sum}' data.txt
Using Regex in Programming Languages
Python
import re
text = "Contact us at support@example.com or sales@company.org"
# Find all email addresses
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print(emails) # ['support@example.com', 'sales@company.org']
# Search for a pattern
match = re.search(r'\d{3}-\d{4}', 'Call 555-1234 today')
if match:
print(match.group()) # 555-1234
# Replace
result = re.sub(r'\bJohn\b', 'Jane', 'John went to John\'s house')
print(result) # Jane went to Jane's house
# Split by regex
parts = re.split(r'[,;\s]+', 'one, two; three four')
print(parts) # ['one', 'two', 'three', 'four']
# Compile for reuse
pattern = re.compile(r'^(\d{4})-(\d{2})-(\d{2})$')
match = pattern.match('2024-01-15')
if match:
year, month, day = match.groups()
JavaScript
const text = "Contact us at support@example.com or sales@company.org";
// Find all matches
const emails = text.match(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g);
console.log(emails); // ['support@example.com', 'sales@company.org']
// Test if a string matches
const isEmail = /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test('user@example.com');
console.log(isEmail); // true
// Replace
const result = 'Hello World'.replace(/world/i, 'Regex');
console.log(result); // Hello Regex
// Named capture groups
const dateMatch = '2024-01-15'.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
console.log(dateMatch.groups.year); // 2024
console.log(dateMatch.groups.month); // 01
Regex in VS Code
VS Code has powerful regex find-and-replace. Press Ctrl+H (or Cmd+H on Mac) and click the .* icon to enable regex mode.
Common tasks:
# Find all console.log statements
console\.log\(.*?\)
# Find empty HTML tags
<(\w+)>\s*</\1>
# Replace tabs with 2 spaces
Find: \t
Replace: (two spaces)
# Add semicolons to lines that don't have them
Find: ([^;{}\s])\s*$
Replace: $1;
# Convert single quotes to double quotes
Find: '([^']*)'
Replace: "$1"
Common Mistakes and Tips
Greedy vs. Lazy Matching
By default, quantifiers are greedy — they match as much as possible:
# Greedy: matches everything between the FIRST < and LAST >
<.*> on "<b>bold</b>" matches "<b>bold</b>"
# Lazy: matches as little as possible (add ?)
<.*?> on "<b>bold</b>" matches "<b>" then "</b>"
Escaping Special Characters
Remember to escape these when you want the literal character: . * + ? ^ $ { } [ ] ( ) | \
# Match a literal dot
\.
# Match a literal dollar sign
\$
# Match a literal backslash
\\
Anchors Matter
Without anchors, regex matches anywhere in the string:
# Without anchors — matches "cat" inside "concatenate"
cat
# With word boundaries — matches only the word "cat"
\bcat\b
# With line anchors — matches only lines that ARE "cat"
^cat$
Testing and Debugging Regex
Use these tools to build and test your patterns:
- regex101.com — https://regex101.com — The best online regex tester. Shows matches, capture groups, and explains each part of your pattern.
- regexr.com — https://regexr.com — Interactive regex tester with a cheat sheet.
- debuggex.com — https://debuggex.com — Visualizes regex as railroad diagrams.
Quick Reference Cheat Sheet
. Any character
\d Digit [0-9]
\w Word char [a-zA-Z0-9_]
\s Whitespace
\b Word boundary
^ Start of line
$ End of line
[abc] Character class
[^abc] Negated class
a|b a OR b
(...) Capture group
(?:...) Non-capture group
* 0 or more
+ 1 or more
? 0 or 1
{n} Exactly n
{n,m} Between n and m
Regex takes practice. Start with simple patterns, test them on regex101.com, and gradually build complexity. Once it clicks, you’ll use regex everywhere — in code, in the terminal, in your editor, and in data processing pipelines.