regex advanced

How to validate email addresses using Perl regex?

Question

How to validate email addresses using Perl regex?

Validating email addresses using Perl regex is a classic but challenging problem due to the complexity of email formats defined in RFC 5322. While fully RFC-compliant validation with pure regex is nearly impossible without extremely complex patterns, you can achieve a practical and robust solution for most common email formats.

In Perl, regex validation involves using the =~ binding operator along with pattern matching, leveraging Perl's rich regex features. When validating email addresses, you want a regex that:

  • Checks for a valid local part (before the @) using alphanumeric characters and allowed special symbols.
  • Validates the domain part with labels separated by dots, allowing hyphens but enforcing alphanumeric starts and ends.
  • Supports Top-Level Domains (TLDs) with at least 2 characters.

Here is one commonly used Perl regex pattern that balances simplicity and robustness, suitable for practical validation:

#!/usr/bin/perl
use strict;
use warnings;

sub is_valid_email {
    my ($email) = @_;
    return 0 unless defined $email;

    # Simple but practical regex for typical email validation
    # Explanation:
    # ^                      : start of string
    # [a-zA-Z0-9_.+-]+       : local part - one or more allowed chars (alnum plus some symbols)
    # @                      : literal at symbol
    # [a-zA-Z0-9-]+          : domain label - one or more alnum or hyphen chars
    # (?:\.[a-zA-Z0-9-]+)*   : zero or more additional domain labels prefixed by a dot
    # \.[a-zA-Z]{2,}         : TLD must be at least 2 alphabetic chars
    # $                      : end of string
    return $email =~ /^[a-zA-Z0-9_.+\-]+@[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,}$/;
}

my @emails = (
    'john.doe@example.com',
    'jane_doe+123@sub.domain.co.uk',
    'invalid-email@.com',
    'noatsymbol.com',
    'bad-char!@example.com',
    'user@local',
);

foreach my $email (@emails) {
    if (is_valid_email($email)) {
        print "'$email' is valid\n";
    } else {
        print "'$email' is invalid\n";
    }
}

Key Points and Perl Concepts

  • =~ is Perl’s binding operator to apply regex on a scalar.
  • ^ and $ anchor the regex to the start and end of the string (important for full string validation).
  • Character classes like [a-zA-Z0-9_.+\-] use Perl's escaping rules (+ and - have special meanings so - is escaped).
  • Non-capturing groups (?:...) avoid unnecessary capture of subpatterns.
  • The function returns a boolean value indicating if the string matches the pattern.
  • The example illustrates iterating over multiple test email addresses, printing validation results.

Common Pitfalls When Using Regex for Email Validation

  • Too strict patterns can exclude valid emails (e.g., addresses with quoted strings or unusual TLDs).
  • Too loose patterns may accept invalid email formats.
  • This regex does not validate IP address literals in domains or comments per RFC 5322.
  • Unicode/UTF-8 characters are not supported in this pattern—only ASCII alphanumerics and some symbols.

Advanced Alternatives

For advanced email validation, consider:

  • Using CPAN modules like Email::Valid that implement RFC 5322 aware validation.
  • Sending a confirmation email to verify the actual existence of the mailbox.

However, for many applications, the above regex provides a good balance between simplicity and correctness. Always test your regex with your expected input set.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 17ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
'john.doe@example.com' is valid
'jane_doe+123@sub.domain.co.uk' is valid
'invalid-email@.com' is invalid
'noatsymbol.com' is invalid
'bad-char!@example.com' is invalid
'user@local' is invalid
STDERR
(empty)

Was this helpful?

Related Questions