Monday, October 24, 2011

Easier parsing of Microsoft perfmon logs with regex

In this blog post I discussed perl code use to slice and dice perfmon logs by passing the Excel column name (ie, A, B, XC, AAZ, etc) in the command line. That made slicing and dicing perfmon log files a lot easier but it was still missing something.

With a large perfmon capture I was having to write down a lot of column names to slice out. I decided there had to be a better way and adapted my code to use regular expressions for column selection.

For example, I recently had a capacity study of a server and all that I wanted was the the disk IO and processor usage.

With the new code (below) I can now pull that quite easily:

perfMonLogRegEx.pl somePerfMonLog.csv 14:00:00 15:00:00 "disk|proc"

Now I automagically get the columns that contain the keywords "disk" and "proc." That pulls out the following from my perfmon log:


LogicalDisk(C:)\% Disk Time
LogicalDisk(D:)\% Disk Time
LogicalDisk(_Total)\% Disk Time
LogicalDisk(C:)\% Idle Time
LogicalDisk(D:)\% Idle Time
LogicalDisk(_Total)\% Idle Time
LogicalDisk(C:)\Avg. Disk Queue Length
LogicalDisk(D:)\Avg. Disk Queue Length
LogicalDisk(_Total)\Avg. Disk Queue Length
LogicalDisk(C:)\Avg. Disk sec/Read
LogicalDisk(D:)\Avg. Disk sec/Read
LogicalDisk(_Total)\Avg. Disk sec/Read
LogicalDisk(C:)\Avg. Disk sec/Write
LogicalDisk(D:)\Avg. Disk sec/Write
LogicalDisk(_Total)\Avg. Disk sec/Write
LogicalDisk(C:)\Disk Reads/sec
LogicalDisk(D:)\Disk Reads/sec
LogicalDisk(_Total)\Disk Reads/sec
LogicalDisk(C:)\Disk Writes/sec
LogicalDisk(D:)\Disk Writes/sec
LogicalDisk(_Total)\Disk Writes/sec


Want just the LogicalDisk(_Total) columns? Easy enough:

perfMonLogRegEx.pl somePerfMonLog.csv 14:00:00 15:00:00 "LogicalDisk\(_Total\)"

Notice that I have to escape the parentheses as I am not trying to capture anything from the regex.

And I get:


LogicalDisk(_Total)\% Disk Time
LogicalDisk(_Total)\% Idle Time
LogicalDisk(_Total)\Avg. Disk Queue Length
LogicalDisk(_Total)\Avg. Disk sec/Read
LogicalDisk(_Total)\Avg. Disk sec/Write
LogicalDisk(_Total)\Disk Reads/sec
LogicalDisk(_Total)\Disk Writes/sec


Want all columns between the upper and lower time limits? Easy enough:

perfMonLogRegEx.pl somePerfMonLog.csv 14:00:00 15:00:00 ".*"

The modified perl code for this handy-dandy slicing and dicing is found below and uses the same command line params as the previous code replacing the Excel column names with the regex for column selection.

1:  use strict;  
2: my $logFile = shift;
3: my $startTime = &timeIntoSeconds(shift);
4: my $endTime = &timeIntoSeconds(shift);
5: my $keyWordRegEx = shift;
6: my @targetColumns = ();
7: my %columnHash = ();
8: my %columnNames = ();
9: open(LOG, "$logFile") || die "$!";
10: while (my $input = <LOG>) {
11: chomp($input);
12: $input =~ s/\"//g;
13: my @columns = split(/\,/, $input);
14: if ($input =~ /PDH-CSV 4\.0/i) {
15: my $maxColumns = $#columns+1;
16: for (my $i = 0; $i < $maxColumns; $i++) {
17: if ($columns[$i] =~ qr/$keyWordRegEx/i) {
18: push(@targetColumns, ($i+1));
19: };
20: };
21: foreach my $columnNumber (@targetColumns) {
22: my @dataArray = ();
23: $columnHash{$columnNumber} = \@dataArray;
24: };
25: print STDOUT "$columns[0]";
26: foreach my $columnNumber (sort {$a <=> $b} (keys(%columnHash))) {
27: $columnNames{$columnNumber} = $columns[$columnNumber-1];
28: print STDOUT ",$columns[$columnNumber-1]";
29: };
30: print STDOUT "\n";
31: } else {
32: my ($dateStamp, $timeStamp) = split(/ /, $columns[0]);
33: my $metricSeconds = &timeIntoSeconds($timeStamp);
34: if ( $metricSeconds >= $startTime && $metricSeconds <= $endTime) {
35: print STDOUT "$columns[0]";
36: foreach my $columnNumber (sort {$a <=> $b} (keys(%columnHash))) {
37: print STDOUT ",$columns[$columnNumber-1]";
38: };
39: print STDOUT "\n";
40: };
41: };
42: if (($. % 1000) == 0) {
43: print STDERR "Working on line $.\n";
44: };
45: };
46: sub timeIntoSeconds() {
47: my ($timeStamp) = @_;
48: my ($hours, $mins, $sec) = split(/:/, $timeStamp);
49: return (($hours * 3600) + ($mins * 60)) + $sec;
50: };