How to parse CSV file and filter rows in Perl?
Question
How to parse CSV file and filter rows in Perl?
Parsing CSV Data and Filtering Rows in Perl Without External Modules
While Text::CSV is the recommended module for robust CSV parsing in Perl, it is not part of the core distribution and may not be available in constrained environments such as sandboxes or minimal installs.
In such cases, if the CSV data is fairly simple (no embedded commas, quotes, or line breaks inside fields), you can parse CSV lines manually by splitting on commas. This approach leverages Perl’s context sensitivity and powerful built-in functions but comes with limitations.
Below is a simple example showing how to parse CSV data from a multiline string, extract headers, and filter rows where the city field equals New York without any external module.
Example: Simple CSV Parsing and Filtering in Pure Perl
use strict;
use warnings;
# Sample CSV data as a multiline string
my $csv_data = <<'END_CSV';
id,name,age,city
1,Alice,30,New York
2,Bob,25,Los Angeles
3,Charlie,35,Chicago
4,Diana,28,New York
END_CSV
# Split CSV data into lines
my @lines = split /\n/, $csv_data;
# Extract and split header line to get column names
my $header_line = shift @lines;
my @headers = split /,/, $header_line;
# Find index of 'city' column
my ($city_idx) = grep { $headers[$_] eq 'city' } 0 .. $#headers;
# Print header line as-is for clarity
print join(", ", @headers), "\n";
# Process each line, split by comma and filter rows with city == 'New York'
for my $line (@lines) {
my @fields = split /,/, $line, scalar @headers; # Limit split to header count
# Defensive check: skip if field count doesn't match header count
next unless @fields == @headers;
if (defined $fields[$city_idx] && $fields[$city_idx] eq 'New York') {
print join(", ", @fields), "\n";
}
}
Explanation
split /,/, $lineseparates fields on commas. This works safely only if fields do not contain commas or quotes.- The header line is extracted first, providing column names and letting us find the index of the column named
city. - The filter matches rows where the
cityfield equalsNew York. - Splitting with a limit equal to the number of headers ensures any trailing commas don’t create extra empty fields.
- Basic defensive checks skip malformed lines that don’t have the expected number of fields.
Common Pitfalls
- Naive splitting fails with quoted fields or commas inside fields. Avoid splitting CSV this way if your data contains quotes or embedded commas.
- Field counts may vary, so always verify the parsed line matches the header field count.
- Whitespace around fields may need trimming depending on your data.
Summary
This pure Perl approach is useful in environments where you cannot install or use external modules like Text::CSV. However, it is recommended to install Text::CSV when possible for robust and flexible CSV parsing, as TMTOWTDI ("There's More Than One Way To Do It") applies strongly in Perl!
Verified Code
Executed in a sandbox to capture real output. • v5.34.1 • 11ms
id, name, age, city
1, Alice, 30, New York
4, Diana, 28, New York
(empty)