How to validate email addresses using Perl regex?
Question
How to validate email addresses using Perl regex?
Validating email addresses using Perl regex is a classic but challenging problem due to the complexity of email formats defined in RFC 5322. While fully RFC-compliant validation with pure regex is nearly impossible without extremely complex patterns, you can achieve a practical and robust solution for most common email formats.
In Perl, regex validation involves using the =~ binding operator along with pattern matching, leveraging Perl's rich regex features. When validating email addresses, you want a regex that:
- Checks for a valid local part (before the
@) using alphanumeric characters and allowed special symbols. - Validates the domain part with labels separated by dots, allowing hyphens but enforcing alphanumeric starts and ends.
- Supports Top-Level Domains (TLDs) with at least 2 characters.
Here is one commonly used Perl regex pattern that balances simplicity and robustness, suitable for practical validation:
#!/usr/bin/perl
use strict;
use warnings;
sub is_valid_email {
my ($email) = @_;
return 0 unless defined $email;
# Simple but practical regex for typical email validation
# Explanation:
# ^ : start of string
# [a-zA-Z0-9_.+-]+ : local part - one or more allowed chars (alnum plus some symbols)
# @ : literal at symbol
# [a-zA-Z0-9-]+ : domain label - one or more alnum or hyphen chars
# (?:\.[a-zA-Z0-9-]+)* : zero or more additional domain labels prefixed by a dot
# \.[a-zA-Z]{2,} : TLD must be at least 2 alphabetic chars
# $ : end of string
return $email =~ /^[a-zA-Z0-9_.+\-]+@[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*\.[a-zA-Z]{2,}$/;
}
my @emails = (
'john.doe@example.com',
'jane_doe+123@sub.domain.co.uk',
'invalid-email@.com',
'noatsymbol.com',
'bad-char!@example.com',
'user@local',
);
foreach my $email (@emails) {
if (is_valid_email($email)) {
print "'$email' is valid\n";
} else {
print "'$email' is invalid\n";
}
}
Key Points and Perl Concepts
=~is Perl’s binding operator to apply regex on a scalar.^and$anchor the regex to the start and end of the string (important for full string validation).- Character classes like
[a-zA-Z0-9_.+\-]use Perl's escaping rules (+and-have special meanings so-is escaped). - Non-capturing groups
(?:...)avoid unnecessary capture of subpatterns. - The function returns a boolean value indicating if the string matches the pattern.
- The example illustrates iterating over multiple test email addresses, printing validation results.
Common Pitfalls When Using Regex for Email Validation
- Too strict patterns can exclude valid emails (e.g., addresses with quoted strings or unusual TLDs).
- Too loose patterns may accept invalid email formats.
- This regex does not validate IP address literals in domains or comments per RFC 5322.
- Unicode/UTF-8 characters are not supported in this pattern—only ASCII alphanumerics and some symbols.
Advanced Alternatives
For advanced email validation, consider:
- Using CPAN modules like
Email::Validthat implement RFC 5322 aware validation. - Sending a confirmation email to verify the actual existence of the mailbox.
However, for many applications, the above regex provides a good balance between simplicity and correctness. Always test your regex with your expected input set.
Verified Code
Executed in a sandbox to capture real output. • v5.34.1 • 17ms
'john.doe@example.com' is valid
'jane_doe+123@sub.domain.co.uk' is valid
'invalid-email@.com' is invalid
'noatsymbol.com' is invalid
'bad-char!@example.com' is invalid
'user@local' is invalid
(empty)Was this helpful?
Related Questions
- How to split a string by regex pattern in Perl?
- How to use greedy vs non-greedy quantifiers in Perl regex?
- How to match start and end of string with ^ and $ in Perl?
- How to use alternation | in Perl regex?
- How to use the qr// operator to compile regex in Perl?
- How to match Unicode characters in Perl regex?