data-structures intermediate

How to remove duplicates from an array in Perl?

Question

How to remove duplicates from an array in Perl?

Removing duplicates from an array is a common task in Perl, and luckily Perl's flexibility offers several ways to achieve this efficiently. The central idea is to leverage a hash, which stores only unique keys, thus naturally filtering duplicates out.

Why use a hash to remove duplicates?

Perl hashes use keys that are unique by definition. When you use the array elements as keys in a hash, duplicate values automatically map to the same key, effectively removing duplicates. Once the hash is built, extracting its keys recreates an array of unique values.

Basic example to remove duplicates from an array

Here is a simple, runnable example illustrating this approach:

use strict;
use warnings;

# Original array with duplicates
my @array = qw(apple orange banana apple pear orange banana);

# Use a hash to remove duplicates
my %seen;
my @unique = grep { !$seen{$_}++ } @array;

# Print the unique values
print "Unique values: @unique\n";

Explanation:

  • @array contains some repeated fruit names.
  • %seen is a hash that tracks elements seen so far.
  • grep { !$seen{$_}++ } @array works as follows:
    • For each element $_ of @array:
    • The first time $_ is encountered, $seen{$_} is undefined (false), so !$seen{$_}++ is true, and the element is kept.
    • $seen{$_}++ increments the count.
    • Subsequent duplicates find $seen{$_} already incremented and thus the expression becomes false, filtering out duplicates.

Perl Concepts Highlighted

  • Sigils: @ for arrays, % for hashes, and $ for scalar elements.
  • Context: grep returns a list, iterating through each element.
  • TMTOWTDI ("there's more than one way to do it"): Other approaches include:
    • Using keys %hash after mapping array items to hash keys
    • Using modules like List::MoreUtils or List::Util's uniq in newer Perl versions (not core before 5.22)

Important Gotchas

  • Order: The hash method preserves the first occurrence's order because grep processes left to right.
  • Complex data: This simple method works for strings or numbers. For arrays of references or complex structures, you'd need a more advanced approach.
  • Duplicates of undef: If your array contains undefined values, treat them carefully since hashes stringify undef as an empty string. Filtering undef separately may be necessary.

Complete runnable script

use strict;
use warnings;

my @array = qw(red blue red green blue yellow red);

my %seen;
my @unique = grep { !$seen{$_}++ } @array;

print "Original array: @array\n";
print "Unique array:   @unique\n";

This prints:


Original array: red blue red green blue yellow red
Unique array:   red blue green yellow

In summary, the fastest and most idiomatic way to remove duplicates from an array in Perl is by using a hash to track seen values, combined with grep to filter the array.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 5ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Unique values: apple orange banana pear
STDERR
(empty)

Was this helpful?

Related Questions