How to use the \d, \w, \s character classes in Perl regex?

In Perl regular expressions, \d, \w, and \s are shorthand character classes that help match common sets of characters without having to spell them out explicitly. These shortcuts make regexes easier to read and write, especially for beginners.

What do `\d`, `\w`, and `\s` match?

\d: Matches any digit character, equivalent to [0-9].
\w: Matches any "word" character, equivalent to [A-Za-z0-9_]. This includes uppercase and lowercase letters, digits, and the underscore character.
\s: Matches any whitespace character, which includes spaces, tabs, newlines, and other Unicode whitespace.

These classes are widely used for pattern matching in Perl due to their conciseness and clarity.

Perl-Specific Notes

They depend on the use utf8; pragma and your regex modifiers. By default, \d only matches ASCII digits, but in Unicode-aware mode (use utf8; with /u), it can match digits from any Unicode digit set.
\w in Unicode mode also includes many letters and digits from other scripts, not just ASCII letters, thanks to Perl's Unicode support (5.14+).
These character classes are sensitive to regex modifiers. For instance, /a restricts matching to ASCII, while /u enables Unicode awareness.
They are a great example of Perl’s TMTOWTDI ("There's more than one way to do it") philosophy: you could write [0-9], but \d is shorter and clearer.

Common Pitfalls

Using \w expecting it to match characters like - or .—it does not. Those need explicit inclusion, e.g., [\w.-].
Assuming \d matches only ASCII digits regardless of Unicode and locale settings.
Not escaping \ properly when writing regexes in double-quoted strings: you often need to write \\d to pass a literal backslash into the regex.

Runnable Perl Code Example

use strict;
use warnings;
use utf8;

my $text = "User123 lives at 456 Elm St.\nNew user: john_doe\nPrice: 12.50 USD\n";

print "Original text:\n$text\n";

# Match digits using \d+
while ($text =~ /(\d+)/g) {
    print "Found number: $1\n";
}

# Match "words" (letters, digits, underscores)
while ($text =~ /(\w+)/g) {
    print "Found word: $1\n";
}

# Match whitespace characters, show their positions
while ($text =~ /(\s)/g) {
    printf "Found whitespace at position %d\n", pos($text) - 1;
}

This script demonstrates:

Using \d+ to find numbers like "123", "456", and "12" in the text.
Using \w+ to extract words such as "User123", "lives", "at", "New", "user", "john_doe", "Price", and "USD".
Using \s to detect whitespace and print their positions in the string.

Run this code as-is with perl - and you will see the character classes in action.

Summary

\d, \w, and \s are extremely handy Perl regex shortcuts for matching digits, word characters, and whitespace, respectively. As you advance with Perl regex, you’ll appreciate how these built-in classes keep your expressions clean and readable. Just watch out for Unicode and escaping issues when embedding these in strings.

How to use the \d, \w, \s character classes in Perl regex?

Question

What do `\d`, `\w`, and `\s` match?

Perl-Specific Notes

Common Pitfalls

Runnable Perl Code Example

Summary

Verified Code

Was this helpful?

Related Questions

Question

What do \d, \w, and \s match?

Perl-Specific Notes

Common Pitfalls

Runnable Perl Code Example

Summary

Verified Code

Was this helpful?

Related Questions

What do `\d`, `\w`, and `\s` match?