How to perform regex match and extract all matches in Perl?
Question
How to perform regex match and extract all matches in Perl?
In Perl, performing a regex match to extract all matches from a string is a common task that hinges on using the /g (global) modifier with the matching operator. This modifier allows you to find all non-overlapping matches instead of stopping after the first.
Below, you’ll find an explanation of the key concepts, a clear example, and some common pitfalls to watch out for.
Understanding Regex Matching with /g
When you use the regex match operator =~ /pattern/ in scalar context, it finds the first match if any. Adding the /g modifier lets you repeatedly find matches across the string. The important point is that you typically use this in list context to capture all matches as a list.
The /g modifier also interacts differently with scalar and list contexts:
- In
list context,=~ /pattern/greturns a list of all matches (or captured groups if you use parentheses). - In
scalar context, it finds the next match each time it is evaluated, allowing iteration.
This is a reflection of Perl’s motto TMTOWTDI (There’s More Than One Way To Do It), giving multiple approaches to extract multiple matches.
Example: Extracting All Words Starting with 'b'
#!/usr/bin/perl
use strict;
use warnings;
my $text = "Baseball is a bat-and-ball game played between two teams.";
# Extract all words starting with 'b' or 'B', case-insensitive
# Using global match in list context to get all matches
my @matches = $text =~ /\b(b\w*)\b/ig;
print "Found ", scalar(@matches), " matches:\n";
print join(", ", @matches), "\n";
In this example:
\bmarks word boundaries.b\w*matches words starting with 'b' followed by zero or more word characters.imakes the match case-insensitive./gspecifies global matching to find all occurrences.- Since the match is in list context,
@matchesholds all captured words.
Common Pitfalls and Tips
- Context matters: Assigning matches to an array ensures list context and extracts all matches at once.
- Captured groups: If your regex uses parentheses, the list returned contains the contents of those groups for each match, not the entire matched string unless grouped accordingly.
- Scalar context iteration: You can iterate matches one by one in scalar context using a
whileloop with the/gmodifier:
while ($text =~ /\b(b\w*)\b/ig) {
print "Match: $1\n";
}
- Overlapping matches: Perl regex does not find overlapping matches by default.
- Regex version: These features work since Perl 5, with improvements starting in Perl 5.10 and later (for example named captures).
Summary
To extract all regex matches in Perl, use the =~ /pattern/g operator in list context to directly get all matches as a list, or in scalar context combined with a while loop to iterate matches one by one. Use capturing parentheses to extract specific parts of matches.
Verified Code
Executed in a sandbox to capture real output. • v5.34.1 • 5ms
Found 4 matches:
Baseball, bat, ball, between
(empty)Was this helpful?
Related Questions
- How to validate email addresses using Perl regex?
- How to split a string by regex pattern in Perl?
- How to use greedy vs non-greedy quantifiers in Perl regex?
- How to match start and end of string with ^ and $ in Perl?
- How to use alternation | in Perl regex?
- How to use the qr// operator to compile regex in Perl?