With a large perfmon capture I was having to write down a lot of column names to slice out. I decided there had to be a better way and adapted my code to use regular expressions for column selection.
For example, I recently had a capacity study of a server and all that I wanted was the the disk IO and processor usage.
With the new code (below) I can now pull that quite easily:
perfMonLogRegEx.pl
somePerfMonLog.csv
14:00:00 15:00:00 "disk|proc"
Now I automagically get the columns that contain the keywords "disk" and "proc." That pulls out the following from my perfmon log:
LogicalDisk(C:)\% Disk Time
LogicalDisk(D:)\% Disk Time
LogicalDisk(_Total)\% Disk Time
LogicalDisk(C:)\% Idle Time
LogicalDisk(D:)\% Idle Time
LogicalDisk(_Total)\% Idle Time
LogicalDisk(C:)\Avg. Disk Queue Length
LogicalDisk(D:)\Avg. Disk Queue Length
LogicalDisk(_Total)\Avg. Disk Queue Length
LogicalDisk(C:)\Avg. Disk sec/Read
LogicalDisk(D:)\Avg. Disk sec/Read
LogicalDisk(_Total)\Avg. Disk sec/Read
LogicalDisk(C:)\Avg. Disk sec/Write
LogicalDisk(D:)\Avg. Disk sec/Write
LogicalDisk(_Total)\Avg. Disk sec/Write
LogicalDisk(C:)\Disk Reads/sec
LogicalDisk(D:)\Disk Reads/sec
LogicalDisk(_Total)\Disk Reads/sec
LogicalDisk(C:)\Disk Writes/sec
LogicalDisk(D:)\Disk Writes/sec
LogicalDisk(_Total)\Disk Writes/sec
Want just the LogicalDisk(_Total) columns? Easy enough:
perfMonLogRegEx.pl
somePerfMonLog.csv
14:00:00 15:00:00 "LogicalDisk\(_Total\)"
Notice that I have to escape the parentheses as I am not trying to capture anything from the regex.
And I get:
LogicalDisk(_Total)\% Disk Time
LogicalDisk(_Total)\% Idle Time
LogicalDisk(_Total)\Avg. Disk Queue Length
LogicalDisk(_Total)\Avg. Disk sec/Read
LogicalDisk(_Total)\Avg. Disk sec/Write
LogicalDisk(_Total)\Disk Reads/sec
LogicalDisk(_Total)\Disk Writes/sec
Want all columns between the upper and lower time limits? Easy enough:
perfMonLogRegEx.pl
somePerfMonLog.csv
14:00:00 15:00:00 ".*"
The modified perl code for this handy-dandy slicing and dicing is found below and uses the same command line params as the previous code replacing the Excel column names with the regex for column selection.
1: use strict;
2: my $logFile = shift;
3: my $startTime = &timeIntoSeconds(shift);
4: my $endTime = &timeIntoSeconds(shift);
5: my $keyWordRegEx = shift;
6: my @targetColumns = ();
7: my %columnHash = ();
8: my %columnNames = ();
9: open(LOG, "$logFile") || die "$!";
10: while (my $input = <LOG>) {
11: chomp($input);
12: $input =~ s/\"//g;
13: my @columns = split(/\,/, $input);
14: if ($input =~ /PDH-CSV 4\.0/i) {
15: my $maxColumns = $#columns+1;
16: for (my $i = 0; $i < $maxColumns; $i++) {
17: if ($columns[$i] =~ qr/$keyWordRegEx/i) {
18: push(@targetColumns, ($i+1));
19: };
20: };
21: foreach my $columnNumber (@targetColumns) {
22: my @dataArray = ();
23: $columnHash{$columnNumber} = \@dataArray;
24: };
25: print STDOUT "$columns[0]";
26: foreach my $columnNumber (sort {$a <=> $b} (keys(%columnHash))) {
27: $columnNames{$columnNumber} = $columns[$columnNumber-1];
28: print STDOUT ",$columns[$columnNumber-1]";
29: };
30: print STDOUT "\n";
31: } else {
32: my ($dateStamp, $timeStamp) = split(/ /, $columns[0]);
33: my $metricSeconds = &timeIntoSeconds($timeStamp);
34: if ( $metricSeconds >= $startTime && $metricSeconds <= $endTime) {
35: print STDOUT "$columns[0]";
36: foreach my $columnNumber (sort {$a <=> $b} (keys(%columnHash))) {
37: print STDOUT ",$columns[$columnNumber-1]";
38: };
39: print STDOUT "\n";
40: };
41: };
42: if (($. % 1000) == 0) {
43: print STDERR "Working on line $.\n";
44: };
45: };
46: sub timeIntoSeconds() {
47: my ($timeStamp) = @_;
48: my ($hours, $mins, $sec) = split(/:/, $timeStamp);
49: return (($hours * 3600) + ($mins * 60)) + $sec;
50: };
No comments:
Post a Comment