How to parse HTTP response headers in Perl?

Parsing HTTP response headers in Perl can be done in several ways, depending on whether you want to use built-in/core modules or handle raw HTTP response data manually. HTTP headers follow a relatively simple structure: a status line followed by key-value header lines, each separated by CRLF (carriage return + line feed).

Here’s a comprehensive approach to parsing HTTP response headers in Perl:

1. Understanding HTTP Response Headers

An HTTP response typically looks like this:

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Length: 138
Connection: keep-alive

<html>...

It starts with a status line (HTTP/1.1 200 OK) followed by zero or more header lines in Key: Value format. Blank line separates headers from the body.

2. Parsing headers manually

If you receive a raw HTTP response (perhaps from a socket), you can parse headers by:

Splitting by line endings (\r\n or just \n)
Extracting the status line and each header line
Collecting headers in a hash for easy access

Key Perl concepts used here include:

split function to split text into lines and header key/value
Hashes with string keys (header names normalized)
Possibly handling multiple headers with same name (like Set-Cookie) by using arrays

3. Example: Parsing HTTP Headers from a raw HTTP response

use strict;
use warnings;

# Simulated raw HTTP response header block (no actual body)
my $raw_response = <<'END_RESPONSE';
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Length: 138
Connection: keep-alive
Set-Cookie: id=12345; Path=/
Set-Cookie: token=abcdef; Secure

END_RESPONSE

# Split response lines (handle CRLF or LF)
my @lines = split /\r?\n/, $raw_response;

# Parse the status line
my $status_line = shift @lines or die "No response status line";

print "Status line: $status_line\n";

# Hash to hold headers
my %headers;

# To handle multiple headers with same names, store values in arrayref
while (@lines) {
    my $line = shift @lines;
    last if $line eq '';  # blank line indicates end of headers

    # Parse header line: key: value
    if ($line =~ /^([^:]+):\s*(.*)$/) {
        my ($field, $value) = (lc $1, $2);  # lowercase keys for uniformity

        # If header seen before, push into array, else store scalar
        if (exists $headers{$field}) {
            # Convert to arrayref if needed
            if (ref $headers{$field} eq 'ARRAY') {
                push @{ $headers{$field} }, $value;
            } else {
                $headers{$field} = [ $headers{$field}, $value ];
            }
        } else {
            $headers{$field} = $value;
        }
    } else {
        warn "Malformed header line: $line\n";
    }
}

# Print parsed headers
print "Parsed headers:\n";
for my $key (sort keys %headers) {
    my $val = $headers{$key};
    if (ref $val eq 'ARRAY') {
        print "  $key => [", join(", ", @$val), "]\n";
    } else {
        print "  $key => $val\n";
    }
}

Output:

Status line: HTTP/1.1 200 OK
Parsed headers:
  connection => keep-alive
  content-length => 138
  content-type => text/html; charset=UTF-8
  set-cookie => [id=12345; Path=/, token=abcdef; Secure]

4. Notes and Pitfalls

Line endings: HTTP headers should use \r\n, but sometimes only \n is present. The regex /\r?\n/ handles both.
Multiple headers with same keys: Some headers can appear multiple times (e.g., Set-Cookie). Storing these as array references helps keep all values.
Case insensitivity: HTTP header field names are case-insensitive, so normalizing keys to lowercase is good practice.
Continued headers: HTTP/1.1 allows header line folding (continuation lines starting with spaces). This example does not handle it, but be mindful for robust parsers.
TMTOWTDI: Perl offers multiple ways to parse headers — using regexes, splitting, or core modules.

5. Using Core Modules

For more robust parsing, you can use the HTTP::Response or HTTP::Headers modules from CPAN, but they’re not core. Instead, the Perl core includes HTTP::Tiny (since 5.14), which handles HTTP requests and responses including headers parsing for you.

Here’s a tiny example using HTTP::Tiny to fetch a URL’s headers:

use strict;
use warnings;
use HTTP::Tiny;

my $http = HTTP::Tiny->new;
my $response = $http->head("http://www.perl.org/");

if ($response->{success}) {
    print "Status: $response->{status} $response->{reason}\n";
    print "Headers:\n";
    while (my ($k, $v) = each %{ $response->{headers} }) {
        print "  $k: $v\n";
    }
} else {
    die "Failed to fetch headers: $response->{status} $response->{reason}\n";
}

This method abstracts away the parsing, but if you want to process a raw HTTP response yourself (e.g., from a socket), the first example is the fundamental way to do so.

Summary

HTTP response headers are key-value pairs following a status line.
Manually parse by splitting lines, extracting key/values into a hash (handle duplicates).
Be aware of line endings, case insensitivity, and multi-valued headers.
Use HTTP::Tiny for simpler, higher-level HTTP header fetching/parsing if you do not need to parse raw data.

With these principles, you can parse and use HTTP response headers effectively in Perl.

How to parse HTTP response headers in Perl?

Question

1. Understanding HTTP Response Headers

2. Parsing headers manually

3. Example: Parsing HTTP Headers from a raw HTTP response

4. Notes and Pitfalls

5. Using Core Modules

Summary

Verified Code

Was this helpful?

Related Questions