Perl beginner

How to Compare Strings in Perl (eq/ne, lt/gt, cmp, case-insensitive, Unicode)

Question

how to compare strings in perl

Category: Perl
Difficulty: Beginner
Tags: perl, strings, comparison, operators, cmp, case-insensitive, unicode, sorting, best-practices, pitfalls

In Perl, comparing strings correctly is mostly about choosing the string operators (like eq and cmp) instead of the numeric operators (like == and <=>). Perl is flexible about converting values between strings and numbers, which is powerful—but it also means that using the wrong operator can silently give you the wrong result. This guide explains all the standard ways to compare strings in Perl, when to use each one, and the common pitfalls (numeric-vs-string mixups, case sensitivity, whitespace, undef warnings, Unicode, and locale issues).

1) The core string comparison operators

Perl has a parallel set of operators for string comparisons. Use these when you want lexicographic (dictionary-like) comparison based on character ordering:

What you wantUseMeaning
EqualeqStrings are exactly identical
Not equalneStrings differ
Less thanltLexicographically smaller
Greater thangtLexicographically larger
Less or equalleLexicographically ≤
Greater or equalgeLexicographically ≥
3-way comparecmpReturns negative/0/positive (like “strcmp”)

Key idea: eq/ne/lt/gt/le/ge/cmp compare based on character order, not numeric value. That means "10" lt "2" is true because "1" comes before "2".

What does cmp return?

$a cmp $b returns:

  • < 0 if $a is lexicographically less than $b
  • 0 if they are equal
  • > 0 if $a is greater than $b

Many examples show -1, 0, or 1, but Perl only promises negative/zero/positive; the exact non-zero magnitude isn’t something you should rely on.

2) Runnable Example 1: Basic string comparisons

#!/usr/bin/env perl
use strict;
use warnings;
use v5.16;

my ($a, $b) = ("Perl", "perl");

print "a eq b? ", ($a eq $b ? "yes" : "no"), "\n";
print "a ne b? ", ($a ne $b ? "yes" : "no"), "\n";
print "a cmp b = ", ($a cmp $b), "\n";

print "Case-insensitive (lc): ", (lc($a) eq lc($b) ? "equal" : "different"), "\n";

my $x = "apple";
my $y = "banana";
print "$x lt $y? ", ($x lt $y ? "yes" : "no"), "\n";
print "$x gt $y? ", ($x gt $y ? "yes" : "no"), "\n";
print "$x le $y? ", ($x le $y ? "yes" : "no"), "\n";
print "$x ge $y? ", ($x ge $y ? "yes" : "no"), "\n";

Expected output:

a eq b? no
a ne b? yes
a cmp b = -1
Case-insensitive (lc): equal
apple lt banana? yes
apple gt banana? no
apple le banana? yes
apple ge banana? no

3) Sorting strings: cmp and custom comparisons

Perl’s sort uses string comparison by default (it behaves like sorting by cmp). If you want to control ordering (for example, case-insensitive sorting), you pass a block that returns a cmp-style result.

Runnable Example 2: Sort strings and show numeric-vs-string pitfalls

#!/usr/bin/env perl
use strict;
use warnings;
use v5.16;

my @words = qw(pear Apple banana);
my @sorted_ci = sort { lc($a) cmp lc($b) } @words;
print "Sorted case-insensitive: @sorted_ci\n";

my @nums_as_strings = qw(2 10 1);
my @lex = sort @nums_as_strings;              # string sort
my @num = sort { $a <=> $b } @nums_as_strings; # numeric sort

print "Default sort (string cmp): @lex\n";
print "Numeric sort (<=>): @num\n";

Expected output:

Sorted case-insensitive: Apple banana pear
Default sort (string cmp): 1 10 2
Numeric sort (<=>): 1 2 10

Best practice: if values are conceptually numeric, compare/sort them numerically with ==, <=>, <, >, etc. If they are conceptually strings (IDs, filenames, tokens), compare them with eq / cmp.

4) Case-insensitive comparisons

Many real-world comparisons should ignore case: usernames, HTTP header names, command keywords, etc. The simplest approach is to normalize both sides and then compare:

  • lc($a) eq lc($b) for ASCII-ish case-insensitive comparisons
  • fc($a) eq fc($b) for Unicode-aware “case folding” (recommended when Unicode matters)

Why not always lc? Unicode has special casing rules. For example, German ß case-folds to ss in many contexts. lc does not fully capture this, while fc is designed for caseless matching.

Runnable Example 3: Unicode case folding with fc

#!/usr/bin/env perl
use strict;
use warnings;
use v5.16;
use utf8;

binmode STDOUT, ":encoding(UTF-8)";

my $s1 = "straße";
my $s2 = "STRASSE";

print "lc equal? ", (lc($s1) eq lc($s2) ? "yes" : "no"), "\n";
print "fc equal? ", (fc($s1) eq fc($s2) ? "yes" : "no"), "\n";

Expected output:

lc equal? no
fc equal? yes

5) Comparing against patterns (regex) vs comparing strings

Sometimes you don’t want to know whether two strings are equal—you want to know whether a string matches a pattern. That is not string comparison; that is regular expression matching:

  • $s =~ /pattern/ tests whether $s matches
  • $s !~ /pattern/ tests whether it does not match

For example, “is this input exactly "yes"?” is a string comparison. “does this input look like an email?” is pattern matching.

6) Best practices

  • Use use strict; use warnings; so mistakes like comparing undef or using the wrong operator are easier to catch.
  • Pick operators by meaning: numeric operators for numbers, string operators for strings.
  • Normalize before comparing when you need “logical” equality:
    • Case-insensitive: fc (Unicode) or lc (basic)
    • Trim whitespace: consider removing leading/trailing whitespace if input is user-provided
    • Line endings: use chomp when reading lines from files
  • Be explicit with undef: comparing undef with eq can warn (“Use of uninitialized value…”). Decide your policy:
    • Require defined: die or handle missing values
    • Coerce: ($a // "") eq ($b // "") if treating undef as empty string is acceptable
  • Use cmp for sorting keys and combine comparisons with || for multi-key sorts, e.g. lc($a) cmp lc($b) || $a cmp $b (case-insensitive primary, stable tie-breaker).

7) Common pitfalls (and how to avoid them)

Pitfall A: Using numeric operators on strings

If you write:

  • "foo" == "bar" (numeric compare)

Perl will try to convert both sides to numbers; non-numeric strings often become 0, leading to surprising “equal” results (and usually warnings). Use eq for exact string equality.

Pitfall B: Lexicographic ordering is not numeric ordering

"10" lt "2" is true because it compares character by character. If you mean numeric, use < or <=> after validating the strings are numeric.

Pitfall C: Hidden whitespace and newlines

Input from files often includes trailing newlines. Comparing a line read from a file directly against a literal can fail unexpectedly:

  • Read: my $line = <STDIN>; (includes newline)
  • Fix: chomp($line); then compare

Similarly, user input can include leading/trailing spaces; decide whether to treat those as significant.

Pitfall D: Locale and “alphabetical order”

By default, string comparisons are based on Perl’s internal character ordering (Unicode code point semantics in modern Perls), which is not necessarily what humans consider alphabetical in a specific language. If you need locale-aware collation (e.g., Swedish ordering), you may need locale or collation libraries. This is an advanced topic: use locale affects comparisons, but it can introduce surprises and depends on environment configuration. For robust human sorting, consider dedicated collation modules (for example ICU-based solutions) rather than relying on incidental locale settings.

Pitfall E: Unicode encoding mismatches

String comparison assumes both strings represent the same sequence of characters. If one string is decoded properly and the other is raw bytes (or decoded with a different encoding), comparisons can fail. In real programs, standardize: decode input to Perl’s internal character strings (e.g., read with an encoding layer or use Encode), and write output with an explicit encoding layer.

8) Quick “when to use what” cheat sheet

  • Exact equality: $a eq $b
  • Not equal: $a ne $b
  • Lexicographic ordering: lt gt le ge
  • 3-way comparison / sort callback: $a cmp $b
  • Case-insensitive (basic): lc($a) eq lc($b)
  • Case-insensitive (Unicode-correct): fc($a) eq fc($b)
  • Numeric comparison: == and <=>
  • Pattern match (not comparison): $s =~ /.../

If you internalize one rule, make it this: use eq for strings and == for numbers. From there, everything else (sorting, case folding, Unicode) becomes a matter of selecting the right normalization and operator for your intent.

Verified Code

Executed in a sandbox to capture real output. • 11ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
a eq b? no
a ne b? yes
a cmp b = -1
Case-insensitive (lc): equal
apple lt banana? yes
apple gt banana? no
apple le banana? yes
apple ge banana? no
STDERR
(empty)

Was this helpful?

Related Questions