How to extract numbers from mixed text in Perl?
Question
How to extract numbers from mixed text in Perl?
Extracting numbers from mixed text in Perl is a common task, often solved using regular expressions (regex). Perl’s powerful and flexible regex engine makes it easy to find and extract numbers embedded anywhere in a string, whether they are integers, decimals, positive, or negative.
Basic Concepts
- Perl uses
//delimiters for regex patterns, which you often use with the=~binding operator. \dmatches any digit (0–9), and+quantifier means "one or more".- The context (scalar vs list) affects what regex matching operators return.
- You can capture matches using parentheses
(...)and retrieve them from@-or@{^MATCH}or via special variables like$1.
Extracting All Numbers from Text
To find all numbers appearing in a string, use a global regex match m/.../g in list context. For example, to extract integers:
#!/usr/bin/perl
use strict;
use warnings;
my $text = "There are 12 apples, 3 oranges, and 42 bananas.";
# Extract all integers
my @numbers = $text =~ /(\d+)/g;
print "Numbers found: @numbers\n";
This prints:
Numbers found: 12 3 42
Extracting More Complex Number Formats
If you want to extract:
- Signed numbers: with
+or-signs - Decimals: numbers including decimal points
- Scientific notation: e.g. 1.2e-3 (optional/advanced)
You can expand your regex accordingly. For example:
my @numbers = $text =~ /([+-]?\d*\.?\d+)/g;
This matches optional sign [+-]?, optional digits before decimal \d*, optional decimal dot \.? , and digits after decimal \d+.
Putting It All Together: Runnable Example
#!/usr/bin/perl
use strict;
use warnings;
my $text = "Temp readings: -3.5, +4, 0.002, and 123 mixed with words and symbols.";
# Extract signed decimal and integer numbers
my @numbers = $text =~ /([+-]?\d*\.?\d+)/g;
print "Extracted numbers:\n";
foreach my $num (@numbers) {
print "$num\n";
}
Output:
Extracted numbers:
-3.5
+4
0.002
123
Perl-Specific Notes
- Sigils: Scalars start with
$, arrays with@. The regex match in list context returns a list, so assign to an array@numbers. - TMTOWTDI: Perl offers many ways (“There’s More Than One Way To Do It”), for instance, you could use the
whileloop with regex global matching instead of list assignment. - Context: In scalar context, regex
m//gallows iterating matches one-by-one. In list context, it returns all matches at once.
Common Pitfalls
- Using just
\d+will miss decimals and signed numbers. - Regex like
\d*\.?\d+can match invalid numbers if you allow zero digits before and after the decimal (e.g., '.') so consider anchoring or using more precise patterns for real-world data. - Be careful with locales if your decimal separator is not '.', Perl expects '.' by default.
With this knowledge, you can confidently extract numbers from any mixed text in Perl efficiently and adapt the regex style based on your needs.
Verified Code
Executed in a sandbox to capture real output. • v5.34.1 • 11ms
Numbers found: 12 3 42
(empty)