Friday, April 29, 2011

Automating gathering statistics from Microsoft Perfmon logs

I found my self in a situation where I needed to be able to slice and dice Microsoft Perfmon files in an automated fashion. Normally I just open up the Perfmon .CSV in Excel and slice and dice as needed.

While in the process of crushing a bunch of production boxes I had 12 Perfmon files and I decided that I was too much of a slackard to open up each file and slice and dice the entire Perfmon file for just a few columns.

I decided to put my perl knowledge to use and automate the process which is easy enough to do. I originally wrote the code to accept the integer positions of the Excel columns that I was interested in but this posed a problem. When looking at the Excel spreadsheet the column names are in the familiar alpha format of A, AA, AB, etc. I decided to add some code so that I could specify the column names of the statistics that I wanted to collect.

I found some code snippets to convert an integer to Excel column format but not vice-a-versa. After some white board thinking I figured out the base 26 format. The basic method is that you have to reverse the column names and then assign a value to the alpha characters (ie, A=1, B=2, ... Z = 26) and then compute the exponent of base 26.

Example:

I want to find the integer value for column "ABC". Reverse "ABC" and get "CBA" and expand the gonkulation in a foreach loop to be:

C * 26^0 + B * 26^1 + A * 26^2

becomes

3 * 26^0 + 2 * 26^1 + 1 * 26^2

simplifies to:

3 + 52 + 676 == 731

The Excel column name is equivalent to the integer value of 731.

I reverse the string of characters so that I can use a simple foreach loop and increment the exponent of base 26 with each iteration of the loop. This makes it really easy to convert the Excel column into an integer.

The reverse, an integer to an Excel column name takes a little bit more work.

The full perl code solution is below. The first param is the .CSV file. The second and third parameters are the start and stop times. My assumption is that the Perfmon .CSV is from the same date and not over multiple days. The last param is the columns with Excel column names. Individual columns can be specified with comma's and a range of columns can be specified with a hyphen.

Example usage:

perfMonLog.pl somePerfMonLog.csv 14:00:00 15:00:00 b-c,bm-bx

In the above example the columns selected would be: B, C, BM, BN, BO, BP, BQ, BR, BS, BT, BU, BV, BW, BX.
1:  use strict;
2: use Statistics::Descriptive;
3:
4: my $logFile = shift;
5: my $startTime = &timeIntoSeconds(shift);
6: my $endTime = &timeIntoSeconds(shift);
7: my @columnParameters = split(/\,/, shift);
8:
9: my @targetColumns = ();
10:
11: foreach my $column (@columnParameters) {
12: if ($column !~ /\-/) {
13: push(@targetColumns, &excelNameToInt($column));
14: } else {
15: my ($lower, $upper) = split(/\-/, $column);
16: for (my $i = &excelNameToInt($lower); $i <= &excelNameToInt($upper); $i++) {
17: push(@targetColumns, $i);
18: };
19: };
20: };
21:
22: my %columnHash = ();
23: my %columnNames = ();
24:
25: foreach my $columnNumber (@targetColumns) {
26: my @dataArray = ();
27: $columnHash{$columnNumber} = \@dataArray;
28: };
29:
30: open(LOG, "$logFile") || die "$!";
31:
32: while (my $input = <LOG>) {
33: chomp($input);
34: $input =~ s/\"//g;
35: my @columns = split(/\,/, $input);
36: if ($input =~ /PDH-CSV 4\.0/i) {
37: foreach my $columnNumber (keys(%columnHash)) {
38: $columnNames{$columnNumber} = $columns[$columnNumber-1];
39: };
40: } else {
41: my ($dateStamp, $timeStamp) = split(/ /, $columns[0]);
42: my $metricSeconds = &timeIntoSeconds($timeStamp);
43:
44: if ( $metricSeconds >= $startTime && $metricSeconds <= $endTime) {
45: foreach my $columnNumber (keys(%columnHash)) {
46: push(@{$columnHash{$columnNumber}}, $columns[$columnNumber-1]);
47: };
48: };
49: };
50:
51: if (($. % 1000) == 0) {
52: print STDERR "Working on line $.\n";
53: };
54:
55: };
56:
57: print STDERR "$logFile:\n";
58: print STDOUT "column,metric,average,stdev,90th,95th\n";
59: foreach my $columnNumber (sort { $a <=> $b } (keys(%columnHash))) {
60: my $stat = Statistics::Descriptive::Full->new();
61: foreach my $metric (@{$columnHash{$columnNumber}}) {
62: if (length($metric) > 0) {
63: $stat->add_data($metric);
64: };
65: };
66: print STDOUT &intToExcelName($columnNumber) . "," .
67: "\"$columnNames{$columnNumber}\"" . "," .
68: $stat->mean() . "," .
69: $stat->standard_deviation() . "," .
70: $stat->percentile(90) . "," .
71: $stat->percentile(95) . "\n";
72: };
73:
74: sub timeIntoSeconds() {
75: my ($timeStamp) = @_;
76: my ($hours, $mins, $sec) = split(/:/, $timeStamp);
77: return (($hours * 3600) + ($mins * 60)) + $sec;
78: };
79:
80: sub intToExcelName() {
81:
82: my ($sum) = @_;
83: $sum--;
84:
85: my $excelName = "";
86:
87: for (my $x = int(log($sum)/log(26)); $x > 0; $x--) {
88: my $k = int($sum/(26**$x));
89: $excelName .= chr($k + 64);
90: $sum -= $k*(26**$x);
91: };
92: $excelName .= chr($sum + 65);
93:
94: return $excelName;
95:
96: };
97:
98: sub excelNameToInt() {
99:
100: my ($excelName) = @_;
101: my @letters = reverse(split(//, uc($excelName)));
102: my $loopCount = 0;
103: my $sum = 0;
104:
105: foreach my $letter (@letters) {
106: my $ord = ord($letter);
107: $sum += (ord($letter) - 64) * 26**$loopCount;
108: $loopCount++;
109: };
110:
111: return $sum;
112: };
113: