text-processing intermediate

How to generate a report from log file data in Perl?

Question

How to generate a report from log file data in Perl?

Generating a report from log file data in Perl is a common text-processing task that involves reading the log, parsing its entries, and summarizing or formatting the information in a meaningful way. Perl is particularly well-suited for this due to its powerful regular expressions, flexible data structures, and concise syntax. This answer will demonstrate a practical approach along with an example script you can run immediately.

Key Concepts

  • Reading and parsing: Use Perl’s built-in filehandling to read logs line-by-line.
  • Regular expressions: Extract relevant parts of each log entry using regex capture groups.
  • Data structures: Store parsed data in hashes or arrays for aggregation (like counts or grouping).
  • Formatting output: Use formatted printing (e.g., printf or sprintf) to create aligned, readable reports.
  • Context and sigils: Understanding scalar $, array @, and hash % variables is key to managing structured data.

Example: Generating a Simple Error Frequency Report

Suppose you have a log file with entries like this (example lines):

2024-04-01 12:00:01 ERROR User failed login
2024-04-01 12:00:02 INFO User logged in
2024-04-01 12:05:10 ERROR Disk space low
2024-04-01 12:06:00 WARNING High CPU usage
2024-04-01 12:07:00 ERROR User failed login

The goal is to count the number of each error message and generate a report sorted by frequency.

Here is a runnable Perl script that reads such log lines from standard input and prints a formatted report:

#!/usr/bin/perl
use strict;
use warnings;

# Hash to store error messages and their counts
my %error_counts;

print "Enter log lines (Ctrl+D to end):\n";

while (my $line = <>) {
    chomp $line;
    
    # Simple regex to capture ERROR lines and message
    if ($line =~ /\bERROR\b\s+(.*)$/) {
        my $message = $1;
        # Increment count of this error message
        $error_counts{$message}++;
    }
}

# Sort errors by descending frequency
my @sorted_errors = sort { $error_counts{$b} <=> $error_counts{$a} } keys %error_counts;

# Print report header
printf "\n%-30s %10s\n", "Error Message", "Count";
print "-" x 42, "\n";

# Print each error and its count
for my $err (@sorted_errors) {
    printf "%-30s %10d\n", $err, $error_counts{$err};
}

# If no errors found, notify user
if (!@sorted_errors) {
    print "No ERROR entries found in the provided input.\n";
}

How This Works

  • <> reads lines from standard input (or from files passed as arguments).
  • Regex /\bERROR\b\s+(.*)$/ matches the word ERROR followed by the rest of the line, capturing the error message.
  • Data is stored in a hash %error_counts where keys are error messages and values are counts.
  • Sorting uses the hash values to order error messages by frequency.
  • Formatted printing with printf aligns the report neatly.

Perl-Specific Notes

  • Sigils: $line is a scalar holding a string, %error_counts is a hash mapping strings to counts.
  • Context: Sorting uses a numeric comparison operator (<=>) to sort by numeric values.
  • TMTOWTDI: Perl allows multiple ways to do the same thing – you could use more complex parsing or different data structures if needed.
  • Version advantages: Perl 5.10+ supports say function and smart matching, but the above script uses core features available since earlier versions.

Common Pitfalls

  • Not chomp-ing lines can cause formatting issues.
  • Regex greediness: Be careful your capture groups do not consume unintended text.
  • Empty or malformed lines should be considered if your logs vary.
  • Assuming logs will always have ERROR entries can lead to empty reports; handle this gracefully.

This example can be extended with timestamp parsing, multiple severity levels, or grouped by time intervals depending on your needs.

Running this script:

perl generate_report.pl < logfile.txt

Or simply paste log lines and then Ctrl+D (on Linux/macOS) or Ctrl+Z (on Windows) to generate the report.

Verified Code

Executed in a sandbox to capture real output. • v5.34.1 • 13ms

Tip: edit code and use “Run (Browser)”. Server runs always execute the published, verified snippet.
STDOUT
Enter log lines (Ctrl+D to end):

Error Message                       Count
------------------------------------------
No ERROR entries found in the provided input.
STDERR
(empty)

Was this helpful?

Related Questions