regex beginner

How to use the \d, \w, \s character classes in Perl regex?

Question

How to use the \d, \w, \s character classes in Perl regex?

In Perl regular expressions, \d, \w, and \s are shorthand character classes that help match common sets of characters without having to spell them out explicitly. These shortcuts make regexes easier to read and write, especially for beginners.

What do \d, \w, and \s match?

  • \d: Matches any digit character, equivalent to [0-9].
  • \w: Matches any "word" character, equivalent to [A-Za-z0-9_]. This includes uppercase and lowercase letters, digits, and the underscore character.
  • \s: Matches any whitespace character, which includes spaces, tabs, newlines, and other Unicode whitespace.

These classes are widely used for pattern matching in Perl due to their conciseness and clarity.

Perl-Specific Notes

  • They depend on the use utf8; pragma and your regex modifiers. By default, \d only matches ASCII digits, but in Unicode-aware mode (use utf8; with /u), it can match digits from any Unicode digit set.
  • \w in Unicode mode also includes many letters and digits from other scripts, not just ASCII letters, thanks to Perl's Unicode support (5.14+).
  • These character classes are sensitive to regex modifiers. For instance, /a restricts matching to ASCII, while /u enables Unicode awareness.
  • They are a great example of Perl’s TMTOWTDI ("There's more than one way to do it") philosophy: you could write [0-9], but \d is shorter and clearer.

Common Pitfalls

  • Using \w expecting it to match characters like - or .—it does not. Those need explicit inclusion, e.g., [\w.-].
  • Assuming \d matches only ASCII digits regardless of Unicode and locale settings.
  • Not escaping \ properly when writing regexes in double-quoted strings: you often need to write \\d to pass a literal backslash into the regex.

Runnable Perl Code Example

use strict;
use warnings;
use utf8;

my $text = "User123 lives at 456 Elm St.\nNew user: john_doe\nPrice: 12.50 USD\n";

print "Original text:\n$text\n";

# Match digits using \d+
while ($text =~ /(\d+)/g) {
    print "Found number: $1\n";
}

# Match "words" (letters, digits, underscores)
while ($text =~ /(\w+)/g) {
    print "Found word: $1\n";
}

# Match whitespace characters, show their positions
while ($text =~ /(\s)/g) {
    printf "Found whitespace at position %d\n", pos($text) - 1;
}

This script demonstrates:

  • Using \d+ to find numbers like "123", "456", and "12" in the text.
  • Using \w+ to extract words such as "User123", "lives", "at", "New", "user", "john_doe", "Price", and "USD".
  • Using \s to detect whitespace and print their positions in the string.

Run this code as-is with perl - and you will see the character classes in action.

Summary

\d, \w, and \s are extremely handy Perl regex shortcuts for matching digits, word characters, and whitespace, respectively. As you advance with Perl regex, you’ll appreciate how these built-in classes keep your expressions clean and readable. Just watch out for Unicode and escaping issues when embedding these in strings.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 10ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Original text:
User123 lives at 456 Elm St.
New user: john_doe
Price: 12.50 USD

Found number: 123
Found number: 456
Found number: 12
Found number: 50
Found word: User123
Found word: lives
Found word: at
Found word: 456
Found word: Elm
Found word: St
Found word: New
Found word: user
Found word: john_doe
Found word: Price
Found word: 12
Found word: 50
Found word: USD
Found whitespace at position 7
Found whitespace at position 13
Found whitespace at position 16
Found whitespace at position 20
Found whitespace at position 24
Found whitespace at position 28
Found whitespace at position 32
Found whitespace at position 38
Found whitespace at position 47
Found whitespace at position 54
Found whitespace at position 60
Found whitespace at position 64
STDERR
(empty)

Was this helpful?

Related Questions