regex intermediate

How to match word boundaries with \b in Perl regex?

Question

How to match word boundaries with \b in Perl regex?

Understanding Word Boundaries (\b) in Perl Regex

In Perl regular expressions, the \b metasequence is used to match a word boundary, which is a position between a word character and a non-word character (or the start/end of the string). It does not match an actual character but a position, making it very useful when you want to find whole words or ensure a pattern appears at word edges.

In Perl, "word characters" are defined as alphanumeric characters plus underscore ([A-Za-z0-9_]). A \b matches the transition point between these word characters and anything else (spaces, punctuation, start/end of string).

Key points about \b in Perl Regex:

  • \b matches a zero-width position, not a character.
  • It matches either the start or end boundary of a "word".
  • Useful to match whole words without matching them inside other words.
  • Opposite is \B, which matches "non-boundary" positions.
  • Works in both scalar and list contexts, but context affects what regex returns.

Example: Matching Whole Words with \b

Suppose you want to find the word cat in a string but avoid matching category or scatter. Using \bcat\b will only match standalone cat.


#!/usr/bin/perl
use strict;
use warnings;

my $text = "The cat sat on the scatter plot near the category.";

# Match whole word "cat" using \b
while ($text =~ /\bcat\b/g) {
    print "Found 'cat' at position ", pos($text) - length("cat"), "\n";
}

# Show failure against substrings
if ($text =~ /\bcat/) {
    print "Matches start of a word 'cat' in '$&'\n";
}

if ($text =~ /cat\b/) {
    print "Matches end of a word 'cat' in '$&'\n";
}

This script prints:


Found 'cat' at position 4
Matches start of a word 'cat'
Matches end of a word 'cat'

Here the first /\bcat\b/ matches only the standalone "cat". The other two examples show you can use \b at start or end to match word starts or ends specifically.

Common Pitfalls and Gotchas

  • Unicode and word boundaries: Older Perl versions (before 5.14) may not handle Unicode word boundaries correctly with \b. Perl 5.14+ improved Unicode support.
  • Using \b inside character classes: Never put \b inside []. For example, [a-z\b] is different and means backspace ASCII (0x08) character, not word boundary.
  • Escaping \b in double-quoted strings: If you use \b in double quotes like "\bcat\b", the \b becomes a backspace character unless escaped as "\\b". Use single quotes or double backslashes.

Summary

The \b anchor is a powerful tool to match word boundaries in Perl regex. It helps you match whole words by verifying the position where a word character transitions to a non-word character. Proper usage avoids partial word matches and makes your regex more precise and readable.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 7ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Found 'cat' at position 4
Matches start of a word 'cat' in 'cat'
Matches end of a word 'cat' in 'cat'
STDERR
(empty)

Was this helpful?

Related Questions