text-processing intermediate
How to extract and format data from unstructured text
Question
How can I extract structured data from unstructured text output using Perl?
Use regex patterns to extract and format data:
#!/usr/bin/perl
use strict;
use warnings;
use JSON;
# Parse unstructured command output
my $output = `netstat -tuln`;
my @connections;
foreach my $line (split /\n/, $output) {
# Extract protocol, local address, and state
if ($line =~ /^(tcp|udp)\s+\d+\s+\d+\s+(\S+):(\d+)\s+(\S+):(\d+)\s+(\w+)?/) {
push @connections, {
protocol => $1,
local_addr => $2,
local_port => $3,
remote_addr => $4,
remote_port => $5,
state => $6 // 'N/A',
};
}
}
# Output as JSON
my $json = JSON->new->pretty->encode(\@connections);
print $json;
# Or output as CSV
print "Protocol,Local,Remote,State\n";
foreach my $conn (@connections) {
printf "%s,%s:%s,%s:%s,%s\n",
$conn->{protocol},
$conn->{local_addr}, $conn->{local_port},
$conn->{remote_addr}, $conn->{remote_port},
$conn->{state};
}
This script parses unstructured netstat output and converts it to structured JSON or CSV format for easier processing.