How to extract numbers from mixed text in Perl?

Question

How to extract numbers from mixed text in Perl?

Extracting numbers from mixed text in Perl is a common task, often solved using regular expressions (regex). Perl’s powerful and flexible regex engine makes it easy to find and extract numbers embedded anywhere in a string, whether they are integers, decimals, positive, or negative.

Basic Concepts

  • Perl uses // delimiters for regex patterns, which you often use with the =~ binding operator.
  • \d matches any digit (0–9), and + quantifier means "one or more".
  • The context (scalar vs list) affects what regex matching operators return.
  • You can capture matches using parentheses (...) and retrieve them from @- or @{^MATCH} or via special variables like $1.

Extracting All Numbers from Text

To find all numbers appearing in a string, use a global regex match m/.../g in list context. For example, to extract integers:


#!/usr/bin/perl
use strict;
use warnings;

my $text = "There are 12 apples, 3 oranges, and 42 bananas.";

# Extract all integers
my @numbers = $text =~ /(\d+)/g;

print "Numbers found: @numbers\n";

This prints:

Numbers found: 12 3 42

Extracting More Complex Number Formats

If you want to extract:

  • Signed numbers: with + or - signs
  • Decimals: numbers including decimal points
  • Scientific notation: e.g. 1.2e-3 (optional/advanced)

You can expand your regex accordingly. For example:


my @numbers = $text =~ /([+-]?\d*\.?\d+)/g;

This matches optional sign [+-]?, optional digits before decimal \d*, optional decimal dot \.? , and digits after decimal \d+.

Putting It All Together: Runnable Example


#!/usr/bin/perl
use strict;
use warnings;

my $text = "Temp readings: -3.5, +4, 0.002, and 123 mixed with words and symbols.";

# Extract signed decimal and integer numbers
my @numbers = $text =~ /([+-]?\d*\.?\d+)/g;

print "Extracted numbers:\n";
foreach my $num (@numbers) {
    print "$num\n";
}

Output:

Extracted numbers:
-3.5
+4
0.002
123

Perl-Specific Notes

  • Sigils: Scalars start with $, arrays with @. The regex match in list context returns a list, so assign to an array @numbers.
  • TMTOWTDI: Perl offers many ways (“There’s More Than One Way To Do It”), for instance, you could use the while loop with regex global matching instead of list assignment.
  • Context: In scalar context, regex m//g allows iterating matches one-by-one. In list context, it returns all matches at once.

Common Pitfalls

  • Using just \d+ will miss decimals and signed numbers.
  • Regex like \d*\.?\d+ can match invalid numbers if you allow zero digits before and after the decimal (e.g., '.') so consider anchoring or using more precise patterns for real-world data.
  • Be careful with locales if your decimal separator is not '.', Perl expects '.' by default.

With this knowledge, you can confidently extract numbers from any mixed text in Perl efficiently and adapt the regex style based on your needs.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 11ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Numbers found: 12 3 42
STDERR
(empty)

Was this helpful?

Related Questions