text-processing intermediate

How to extract and format data from unstructured text

Question

How can I extract structured data from unstructured text output using Perl?

Use regex patterns to extract and format data:

#!/usr/bin/perl
use strict;
use warnings;
use JSON;

# Parse unstructured command output
my $output = `netstat -tuln`;

my @connections;
foreach my $line (split /\n/, $output) {
    # Extract protocol, local address, and state
    if ($line =~ /^(tcp|udp)\s+\d+\s+\d+\s+(\S+):(\d+)\s+(\S+):(\d+)\s+(\w+)?/) {
        push @connections, {
            protocol     => $1,
            local_addr   => $2,
            local_port   => $3,
            remote_addr  => $4,
            remote_port  => $5,
            state        => $6 // 'N/A',
        };
    }
}

# Output as JSON
my $json = JSON->new->pretty->encode(\@connections);
print $json;

# Or output as CSV
print "Protocol,Local,Remote,State\n";
foreach my $conn (@connections) {
    printf "%s,%s:%s,%s:%s,%s\n",
        $conn->{protocol},
        $conn->{local_addr}, $conn->{local_port},
        $conn->{remote_addr}, $conn->{remote_port},
        $conn->{state};
}

This script parses unstructured netstat output and converts it to structured JSON or CSV format for easier processing.

Was this helpful?

Related Questions