regex intermediate

How to use greedy vs non-greedy quantifiers in Perl regex?

Question

How to use greedy vs non-greedy quantifiers in Perl regex?

In Perl regular expressions, quantifiers control how many times a pattern is matched. Understanding greedy versus non-greedy (lazy) quantifiers is essential to extract the desired substring without unintended overmatching or undermatching.

Greedy vs Non-Greedy Quantifiers

Greedy quantifiers try to match as much of the input string as possible while still allowing the overall regex to succeed. They expand the match to the largest valid substring.

Non-greedy quantifiers (sometimes called lazy) try to match as few characters as possible while still allowing the regex to succeed. They expand the match only until the next part of the regex can match.

Common Quantifiers

  • * — zero or more (greedy)
  • + — one or more (greedy)
  • ? — zero or one (greedy)
  • {n,m} — between n and m repetitions (greedy)

Adding ? for Non-Greedy

To convert any of the above quantifiers to non-greedy, add a ? directly after:

  • *? — zero or more (non-greedy)
  • +? — one or more (non-greedy)
  • ?? — zero or one (non-greedy)
  • {n,m}? — between n and m repetitions (non-greedy)

Example: Matching Text Between Tags

Suppose you want to extract the text inside HTML-like tags. Greedy quantifiers will often capture too much, whereas non-greedy quantifiers narrow the match.

use strict;
use warnings;

my $string = "<tag>Here is <some> text</tag>";

print "Original string: $string\n";

# Greedy match - captures everything from first <tag> to last </tag>
if ($string =~ /<tag>(.*)<\/tag>/) {
    print "Greedy match: $1\n";
}

# Non-greedy match - captures the smallest match inside the tags
if ($string =~ /<tag>(.*?)<\/tag>/) {
    print "Non-greedy match: $1\n";
}

Output Explanation

The greedy regex .* captures from the first <tag> up to the last </tag>, including intermediate tags, resulting in:

Greedy match: Here is <some> text

Because the match is greedy, it consumes as much as possible while still allowing the match to succeed.

The non-greedy regex .*? captures the shortest possible substring, stopping right before the first </tag>, which is exactly what we likely want to extract.

Perl-Specific Notes

  • Non-greedy quantifiers have been supported since Perl 5.006, so they are widely available.
  • The .*? pattern is useful when matching inside tags, parentheses, quotes, or any delimiters.
  • Remember Perl's context: quantifier behavior can differ if you're matching in scalar or list context (though quantifiers themselves are context-agnostic).
  • Sigils matter: $string =~ /pattern/ uses a scalar string with regex matching operator =~.
  • This is a great example of the Perl motto TMTOWTDI ("There's More Than One Way To Do It") where you can choose greedy or non-greedy matching depending on your needs.

Common Pitfalls

  • Using greedy quantifiers when you really want a minimal match can cause overmatching and unintended results.
  • Overusing non-greedy quantifiers can lead to inefficient regex evaluation or incomplete matches if not combined with correct anchors.
  • For nested structures (like nested tags), regex is often insufficient without recursive patterns introduced in Perl 5.10+.

In summary, append ? immediately after a greedy quantifier to make it non-greedy, and test your regex thoroughly to ensure you get the desired substring. Understanding this difference unlocks powerful, precise text processing in Perl.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 9ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Original string: <tag>Here is <some> text</tag>
Greedy match: Here is <some> text
Non-greedy match: Here is <some> text
STDERR
(empty)

Was this helpful?

Related Questions