How to match word boundaries with \b in Perl regex?
Question
How to match word boundaries with \b in Perl regex?
Understanding Word Boundaries (\b) in Perl Regex
In Perl regular expressions, the \b metasequence is used to match a word boundary, which is a position between a word character and a non-word character (or the start/end of the string). It does not match an actual character but a position, making it very useful when you want to find whole words or ensure a pattern appears at word edges.
In Perl, "word characters" are defined as alphanumeric characters plus underscore ([A-Za-z0-9_]). A \b matches the transition point between these word characters and anything else (spaces, punctuation, start/end of string).
Key points about \b in Perl Regex:
\bmatches a zero-width position, not a character.- It matches either the start or end boundary of a "word".
- Useful to match whole words without matching them inside other words.
- Opposite is
\B, which matches "non-boundary" positions. - Works in both scalar and list contexts, but context affects what regex returns.
Example: Matching Whole Words with \b
Suppose you want to find the word cat in a string but avoid matching category or scatter. Using \bcat\b will only match standalone cat.
#!/usr/bin/perl
use strict;
use warnings;
my $text = "The cat sat on the scatter plot near the category.";
# Match whole word "cat" using \b
while ($text =~ /\bcat\b/g) {
print "Found 'cat' at position ", pos($text) - length("cat"), "\n";
}
# Show failure against substrings
if ($text =~ /\bcat/) {
print "Matches start of a word 'cat' in '$&'\n";
}
if ($text =~ /cat\b/) {
print "Matches end of a word 'cat' in '$&'\n";
}
This script prints:
Found 'cat' at position 4
Matches start of a word 'cat'
Matches end of a word 'cat'
Here the first /\bcat\b/ matches only the standalone "cat". The other two examples show you can use \b at start or end to match word starts or ends specifically.
Common Pitfalls and Gotchas
- Unicode and word boundaries: Older Perl versions (before 5.14) may not handle Unicode word boundaries correctly with
\b. Perl 5.14+ improved Unicode support. - Using
\binside character classes: Never put\binside[]. For example,[a-z\b]is different and means backspace ASCII (0x08) character, not word boundary. - Escaping
\bin double-quoted strings: If you use\bin double quotes like "\bcat\b", the\bbecomes a backspace character unless escaped as "\\b". Use single quotes or double backslashes.
Summary
The \b anchor is a powerful tool to match word boundaries in Perl regex. It helps you match whole words by verifying the position where a word character transitions to a non-word character. Proper usage avoids partial word matches and makes your regex more precise and readable.
Verified Code
Executed in a sandbox to capture real output. • v5.34.1 • 7ms
Found 'cat' at position 4
Matches start of a word 'cat' in 'cat'
Matches end of a word 'cat' in 'cat'
(empty)Was this helpful?
Related Questions
- How to validate email addresses using Perl regex?
- How to split a string by regex pattern in Perl?
- How to use greedy vs non-greedy quantifiers in Perl regex?
- How to match start and end of string with ^ and $ in Perl?
- How to use alternation | in Perl regex?
- How to use the qr// operator to compile regex in Perl?