text-processing intermediate

How to extract email addresses from text file using Perl?

Question

How to extract email addresses from text file using Perl?

Extracting Email Addresses From Text Using Perl Without Files

Since your code needs to run in a sandbox without filesystem access, you cannot read from a text file directly. Instead, you can simulate input by reading from DATA or standard input (STDIN). This example demonstrates extracting email addresses from a multiline string included after the __DATA__ token, making it fully runnable with perl -.

Key Concepts

  • DATA handle: Allows you to embed input data inside the script itself without external files.
  • Regex matching with /g: Extracts all emails per line.
  • Hashes for uniqueness: Avoids printing duplicates.

Example Perl Script (Runnable Anywhere)

use strict;
use warnings;

# Hash to store unique emails
my %emails;

# Read lines from embedded DATA section
while (my $line = <>) {
    chomp $line;
    # Simple regex to match typical emails
    while ($line =~ /([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)/g) {
        $emails{$1} = 1;
    }
}

print "Extracted email addresses:\n";
print "$_\n" for sort keys %emails;

__DATA__
Here are some emails:
alice@example.com
bob.smith@domain.co.uk
invalid-email@domain
jane-doe123@sub.domain.org

How This Works

  • No external files: The input text is embedded after the __DATA__ token.
  • <> reads line-by-line from DATA or STDIN if redirected.
  • The regex /([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)/g captures email-like patterns.
  • Storing matches in a hash ensures uniqueness.
  • Finally, all unique emails are printed sorted alphabetically.

Notes and Common Pitfalls

  • This regex is a practical approximation but does not capture all valid RFC-compliant emails.
  • Reading from DATA makes the script self-contained and sandbox-compatible.
  • Be cautious when using character classes in email regex; dots and hyphens should be properly escaped or positioned.
  • Using /g inside a while loop over a line ensures all matches on that line are found.

Summary

When you cannot read from external files, using the __DATA__ section lets you embed sample text directly within your Perl script. This approach, combined with core Perl regex capabilities and hashes, allows extraction of email addresses efficiently and portably without external dependencies or file access.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 7ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Extracted email addresses:
alice@example.com
bob.smith@domain.co.uk
jane-doe123@sub.domain.org
STDERR
(empty)

Was this helpful?

Related Questions