How to remove duplicates from an array in Perl?
Question
How to remove duplicates from an array in Perl?
Removing duplicates from an array is a common task in Perl, and luckily Perl's flexibility offers several ways to achieve this efficiently. The central idea is to leverage a hash, which stores only unique keys, thus naturally filtering duplicates out.
Why use a hash to remove duplicates?
Perl hashes use keys that are unique by definition. When you use the array elements as keys in a hash, duplicate values automatically map to the same key, effectively removing duplicates. Once the hash is built, extracting its keys recreates an array of unique values.
Basic example to remove duplicates from an array
Here is a simple, runnable example illustrating this approach:
use strict;
use warnings;
# Original array with duplicates
my @array = qw(apple orange banana apple pear orange banana);
# Use a hash to remove duplicates
my %seen;
my @unique = grep { !$seen{$_}++ } @array;
# Print the unique values
print "Unique values: @unique\n";
Explanation:
@arraycontains some repeated fruit names.%seenis a hash that tracks elements seen so far.grep { !$seen{$_}++ } @arrayworks as follows:- For each element
$_of@array: - The first time
$_is encountered,$seen{$_}is undefined (false), so!$seen{$_}++is true, and the element is kept. $seen{$_}++increments the count.- Subsequent duplicates find
$seen{$_}already incremented and thus the expression becomes false, filtering out duplicates.
- For each element
Perl Concepts Highlighted
- Sigils:
@for arrays,%for hashes, and$for scalar elements. - Context:
grepreturns a list, iterating through each element. - TMTOWTDI ("there's more than one way to do it"): Other approaches include:
- Using
keys %hashafter mapping array items to hash keys - Using modules like List::MoreUtils or List::Util's
uniqin newer Perl versions (not core before 5.22)
- Using
Important Gotchas
- Order: The hash method preserves the first occurrence's order because
grepprocesses left to right. - Complex data: This simple method works for strings or numbers. For arrays of references or complex structures, you'd need a more advanced approach.
- Duplicates of
undef: If your array contains undefined values, treat them carefully since hashes stringifyundefas an empty string. Filteringundefseparately may be necessary.
Complete runnable script
use strict;
use warnings;
my @array = qw(red blue red green blue yellow red);
my %seen;
my @unique = grep { !$seen{$_}++ } @array;
print "Original array: @array\n";
print "Unique array: @unique\n";
This prints:
Original array: red blue red green blue yellow red
Unique array: red blue green yellow
In summary, the fastest and most idiomatic way to remove duplicates from an array in Perl is by using a hash to track seen values, combined with grep to filter the array.
Verified Code
Executed in a sandbox to capture real output. • v5.34.1 • 5ms
Unique values: apple orange banana pear
(empty)