regex advanced

How to use lookbehind assertions (?<=...) in Perl regex?

Question

How to use lookbehind assertions (?<=...) in Perl regex?

Using Lookbehind Assertions (?<=...) in Perl Regex

Lookbehind assertions in Perl regular expressions allow you to match a pattern only if it is immediately preceded by another pattern, without including that preceding pattern in the match result. This feature is especially useful when you need to perform conditional matching based on the context before the main pattern. Lookbehind assertions come in the form (?<=pattern) for positive lookbehind and (?<!pattern) for negative lookbehind.

Basic Syntax

  • /(?<=foo)bar/ – Matches "bar" only if it is immediately preceded by "foo".
  • /(?<!foo)bar/ – Matches "bar" only if it is not immediately preceded by "foo".

Lookbehinds do not consume characters, meaning the part matched inside the lookbehind is not included in the overall match result.

Perl Version Requirements

Perl has supported lookbehind assertions since Perl 5.10. However, for a long time, lookbehind patterns were required to be fixed-width (i.e., a pattern of a fixed number of characters). Starting with Perl 5.30, variable-width lookbehinds were introduced, allowing more flexible patterns inside (?<=...). This means:

  • In Perl 5.10 through 5.28, lookbehind patterns must have a fixed length, e.g. /(?<=foo)bar/ but /(?<=f[oO])bar/ is invalid because alternatives vary in length.
  • From Perl 5.30 onwards, certain variable-width lookbehinds are allowed.

Common Pitfalls

  • Variable-width lookbehind restrictions: Writing a lookbehind with no fixed length in older Perl versions will cause a runtime error.
  • Beware multi-byte characters: The byte-length of the lookbehind pattern matters, so be careful with Unicode strings.
  • Performance: Excessively complex lookbehind patterns can affect regex performance.

Example: Match a Word Only if Preceded by "Mr. "

This example demonstrates a positive lookbehind to match names preceded specifically by "Mr. ".


use strict;
use warnings;

my $text = "Mr. Smith and Mr. Johnson are here, but Ms. Brown is not.";

# Match words that are preceded by "Mr. "
while ($text =~ /(?<=Mr\. )\w+/g) {
    print "Found a name preceded by 'Mr. ': $&\n";
}

Output:


Found a name preceded by 'Mr. ': Smith
Found a name preceded by 'Mr. ': Johnson

Explanation

  • (?<=Mr\. ) is the positive lookbehind assertion, confirming what immediately precedes the match is "Mr. ".
  • \w+ matches the subsequent word (the name) itself.
  • The match only yields the name, not "Mr. ", because lookbehind does not consume characters.

Summary

Lookbehind assertions in Perl regex allow context-sensitive matching without capturing the preceding text. Remember that fixed-width patterns are required below Perl 5.30, so keep patterns simple or check your Perl version for support of variable-width lookbehinds. Using (?<=...) correctly can make your regexes very precise and robust.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 6ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Found a name preceded by 'Mr. ': Smith
Found a name preceded by 'Mr. ': Johnson
STDERR
(empty)

Was this helpful?

Related Questions