How to use the \d, \w, \s character classes in Perl regex?
Question
How to use the \d, \w, \s character classes in Perl regex?
In Perl regular expressions, \d, \w, and \s are shorthand character classes that help match common sets of characters without having to spell them out explicitly. These shortcuts make regexes easier to read and write, especially for beginners.
What do \d, \w, and \s match?
\d: Matches any digit character, equivalent to[0-9].\w: Matches any "word" character, equivalent to[A-Za-z0-9_]. This includes uppercase and lowercase letters, digits, and the underscore character.\s: Matches any whitespace character, which includes spaces, tabs, newlines, and other Unicode whitespace.
These classes are widely used for pattern matching in Perl due to their conciseness and clarity.
Perl-Specific Notes
- They depend on the
use utf8;pragma and your regex modifiers. By default,\donly matches ASCII digits, but in Unicode-aware mode (use utf8;with/u), it can match digits from any Unicode digit set. \win Unicode mode also includes many letters and digits from other scripts, not just ASCII letters, thanks to Perl's Unicode support (5.14+).- These character classes are sensitive to regex modifiers. For instance,
/arestricts matching to ASCII, while/uenables Unicode awareness. - They are a great example of Perl’s TMTOWTDI ("There's more than one way to do it") philosophy: you could write
[0-9], but\dis shorter and clearer.
Common Pitfalls
- Using
\wexpecting it to match characters like-or.—it does not. Those need explicit inclusion, e.g.,[\w.-]. - Assuming
\dmatches only ASCII digits regardless of Unicode and locale settings. - Not escaping
\properly when writing regexes in double-quoted strings: you often need to write\\dto pass a literal backslash into the regex.
Runnable Perl Code Example
use strict;
use warnings;
use utf8;
my $text = "User123 lives at 456 Elm St.\nNew user: john_doe\nPrice: 12.50 USD\n";
print "Original text:\n$text\n";
# Match digits using \d+
while ($text =~ /(\d+)/g) {
print "Found number: $1\n";
}
# Match "words" (letters, digits, underscores)
while ($text =~ /(\w+)/g) {
print "Found word: $1\n";
}
# Match whitespace characters, show their positions
while ($text =~ /(\s)/g) {
printf "Found whitespace at position %d\n", pos($text) - 1;
}
This script demonstrates:
- Using
\d+to find numbers like "123", "456", and "12" in the text. - Using
\w+to extract words such as "User123", "lives", "at", "New", "user", "john_doe", "Price", and "USD". - Using
\sto detect whitespace and print their positions in the string.
Run this code as-is with perl - and you will see the character classes in action.
Summary
\d, \w, and \s are extremely handy Perl regex shortcuts for matching digits, word characters, and whitespace, respectively. As you advance with Perl regex, you’ll appreciate how these built-in classes keep your expressions clean and readable. Just watch out for Unicode and escaping issues when embedding these in strings.
Verified Code
Executed in a sandbox to capture real output. • v5.34.1 • 10ms
Original text:
User123 lives at 456 Elm St.
New user: john_doe
Price: 12.50 USD
Found number: 123
Found number: 456
Found number: 12
Found number: 50
Found word: User123
Found word: lives
Found word: at
Found word: 456
Found word: Elm
Found word: St
Found word: New
Found word: user
Found word: john_doe
Found word: Price
Found word: 12
Found word: 50
Found word: USD
Found whitespace at position 7
Found whitespace at position 13
Found whitespace at position 16
Found whitespace at position 20
Found whitespace at position 24
Found whitespace at position 28
Found whitespace at position 32
Found whitespace at position 38
Found whitespace at position 47
Found whitespace at position 54
Found whitespace at position 60
Found whitespace at position 64
(empty)Was this helpful?
Related Questions
- How to validate email addresses using Perl regex?
- How to split a string by regex pattern in Perl?
- How to use greedy vs non-greedy quantifiers in Perl regex?
- How to match start and end of string with ^ and $ in Perl?
- How to use alternation | in Perl regex?
- How to use the qr// operator to compile regex in Perl?