regex intermediate

How to use capture groups in Perl regex?

Question

How to use capture groups in Perl regex?

In Perl, capture groups are a powerful feature of regular expressions that allow you to extract portions of a matched string. By enclosing part of a regex pattern in parentheses (…), Perl remembers the substring matched by that part of the pattern. You can then access these captured groups using special variables or in list context.

How Capture Groups Work in Perl Regex

Capture groups are denoted by parentheses in the regex, for example: /(\d{3})-(\d{2})-(\d{4})/ would match a pattern like 123-45-6789, capturing three separate groups for the three parts.

  • Sigils: When a regex matches successfully, the captured groups are available in the special variables $1, $2, $3, etc., corresponding to the order of the parentheses.
  • You can also capture multiple groups and access them in list context if you use the regex with the =~ operator inside parentheses.
  • Non-capturing groups (?:…) let you group parts of a pattern without capturing them.

Example: Extracting Capture Groups

#!/usr/bin/perl
use strict;
use warnings;

my $string = "My phone number is 123-456-7890";

if ($string =~ /(\d{3})-(\d{3})-(\d{4})/) {
    print "Captured groups:\n";
    print "Area code: $1\n";     # First captured group
    print "Exchange: $2\n";      # Second captured group
    print "Line number: $3\n";  # Third captured group
} else {
    print "No match found\n";
}

Output when you run the script:

Captured groups:
Area code: 123
Exchange: 456
Line number: 7890

Using Capture Groups in List Context

You can also extract all captures at once by assigning a regex match in list context:

my ($area, $exchange, $line) = $string =~ /(\d{3})-(\d{3})-(\d{4})/;
print "Area code: $area, Exchange: $exchange, Line: $line\n";

Things to Keep in Mind

  • Capture Variables Reset: The capture variables $1, $2, etc., only contain values from the most recent successful pattern match in the current scope.
  • Greedy vs Non-Greedy: Be cautious with quantifiers inside capture groups; * and + by default are greedy and may capture more than expected.
  • Non-capturing groups: Use (?:…) when grouping without capturing to avoid unwanted captures.
  • Named Capture Groups (Perl 5.10+): You can use named captures like /(?<name>pattern)/ and access them via %+ hash.

Example: Named Capture Groups (Perl 5.10+)

my $date = "2024-06-01";
if ($date =~ /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/) {
    print "Year: $+{year}, Month: $+{month}, Day: $+{day}\n";
}

This outputs:

Year: 2024, Month: 06, Day: 01

Summary

Capture groups in Perl regex provide a flexible way to extract portions of strings matched by parentheses. You access them via $1, $2, … for numbered groups or named captures in Perl 5.10+. Remember to consider context and greediness, and use non-capturing groups when you don't need the data saved. This makes complex string parsing and data extraction easy in Perl.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 4ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Captured groups:
Area code: 123
Exchange: 456
Line number: 7890
STDERR
(empty)

Was this helpful?

Related Questions