regex intermediate

How to perform regex match and extract all matches in Perl?

Question

How to perform regex match and extract all matches in Perl?

In Perl, performing a regex match to extract all matches from a string is a common task that hinges on using the /g (global) modifier with the matching operator. This modifier allows you to find all non-overlapping matches instead of stopping after the first.

Below, you’ll find an explanation of the key concepts, a clear example, and some common pitfalls to watch out for.

Understanding Regex Matching with /g

When you use the regex match operator =~ /pattern/ in scalar context, it finds the first match if any. Adding the /g modifier lets you repeatedly find matches across the string. The important point is that you typically use this in list context to capture all matches as a list.

The /g modifier also interacts differently with scalar and list contexts:

  • In list context, =~ /pattern/g returns a list of all matches (or captured groups if you use parentheses).
  • In scalar context, it finds the next match each time it is evaluated, allowing iteration.

This is a reflection of Perl’s motto TMTOWTDI (There’s More Than One Way To Do It), giving multiple approaches to extract multiple matches.

Example: Extracting All Words Starting with 'b'

#!/usr/bin/perl
use strict;
use warnings;

my $text = "Baseball is a bat-and-ball game played between two teams.";

# Extract all words starting with 'b' or 'B', case-insensitive
# Using global match in list context to get all matches
my @matches = $text =~ /\b(b\w*)\b/ig;

print "Found ", scalar(@matches), " matches:\n";
print join(", ", @matches), "\n";

In this example:

  • \b marks word boundaries.
  • b\w* matches words starting with 'b' followed by zero or more word characters.
  • i makes the match case-insensitive.
  • /g specifies global matching to find all occurrences.
  • Since the match is in list context, @matches holds all captured words.

Common Pitfalls and Tips

  • Context matters: Assigning matches to an array ensures list context and extracts all matches at once.
  • Captured groups: If your regex uses parentheses, the list returned contains the contents of those groups for each match, not the entire matched string unless grouped accordingly.
  • Scalar context iteration: You can iterate matches one by one in scalar context using a while loop with the /g modifier:
while ($text =~ /\b(b\w*)\b/ig) {
    print "Match: $1\n";
}
  • Overlapping matches: Perl regex does not find overlapping matches by default.
  • Regex version: These features work since Perl 5, with improvements starting in Perl 5.10 and later (for example named captures).

Summary

To extract all regex matches in Perl, use the =~ /pattern/g operator in list context to directly get all matches as a list, or in scalar context combined with a while loop to iterate matches one by one. Use capturing parentheses to extract specific parts of matches.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 5ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Found 4 matches:
Baseball, bat, ball, between
STDERR
(empty)

Was this helpful?

Related Questions