How to validate email addresses using Perl regex?

Validating email addresses using Perl regex is a classic but challenging problem due to the complexity of email formats defined in RFC 5322. While fully RFC-compliant validation with pure regex is nearly impossible without extremely complex patterns, you can achieve a practical and robust solution for most common email formats.

In Perl, regex validation involves using the =~ binding operator along with pattern matching, leveraging Perl's rich regex features. When validating email addresses, you want a regex that:

Checks for a valid local part (before the @) using alphanumeric characters and allowed special symbols.
Validates the domain part with labels separated by dots, allowing hyphens but enforcing alphanumeric starts and ends.
Supports Top-Level Domains (TLDs) with at least 2 characters.

Here is one commonly used Perl regex pattern that balances simplicity and robustness, suitable for practical validation:

#!/usr/bin/perl
use strict;
use warnings;

sub is_valid_email {
    my ($email) = @_;
    return 0 unless defined $email;

    # Simple but practical regex for typical email validation
    # Explanation:
    # ^                      : start of string
    # [a-zA-Z0-9_.+-]+       : local part - one or more allowed chars (alnum plus some symbols)
    # @                      : literal at symbol
    # [a-zA-Z0-9-]+          : domain label - one or more alnum or hyphen chars
    # (?:\.[a-zA-Z0-9-]+)*   : zero or more additional domain labels prefixed by a dot
    # \.[a-zA-Z]{2,}         : TLD must be at least 2 alphabetic chars
    # $                      : end of string
    return $email =~ /^[a-zA-Z0-9_.+\-]+@[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,}$/;
}

my @emails = (
    'john.doe@example.com',
    'jane_doe+123@sub.domain.co.uk',
    'invalid-email@.com',
    'noatsymbol.com',
    'bad-char!@example.com',
    'user@local',
);

foreach my $email (@emails) {
    if (is_valid_email($email)) {
        print "'$email' is valid\n";
    } else {
        print "'$email' is invalid\n";
    }
}

Key Points and Perl Concepts

=~ is Perl’s binding operator to apply regex on a scalar.
^ and $ anchor the regex to the start and end of the string (important for full string validation).
Character classes like [a-zA-Z0-9_.+\-] use Perl's escaping rules (+ and - have special meanings so - is escaped).
Non-capturing groups (?:...) avoid unnecessary capture of subpatterns.
The function returns a boolean value indicating if the string matches the pattern.
The example illustrates iterating over multiple test email addresses, printing validation results.

Common Pitfalls When Using Regex for Email Validation

Too strict patterns can exclude valid emails (e.g., addresses with quoted strings or unusual TLDs).
Too loose patterns may accept invalid email formats.
This regex does not validate IP address literals in domains or comments per RFC 5322.
Unicode/UTF-8 characters are not supported in this pattern—only ASCII alphanumerics and some symbols.

Advanced Alternatives

For advanced email validation, consider:

Using CPAN modules like Email::Valid that implement RFC 5322 aware validation.
Sending a confirmation email to verify the actual existence of the mailbox.

However, for many applications, the above regex provides a good balance between simplicity and correctness. Always test your regex with your expected input set.

How to validate email addresses using Perl regex?

Question

Key Points and Perl Concepts

Common Pitfalls When Using Regex for Email Validation

Advanced Alternatives

Verified Code

Was this helpful?

Related Questions