Sunday, April 29, 2012

Creating stacked area plots with GD::Graph::area in Perl

I recently came across a need to generate an area plot to show the utilization of a series of virtual machines on an IBM pSeries frame. I had already written this perl utility working with TeamQuest's tqharvc.exe to extract TeamQuest physc data. The routine dumps out both .CSVs and scatter plots for requested metrics that are specified by a saved TeamQuest View .rpt file created with TeamQuest View.

I decided to write another perl utility to stitch together indiviual .CSV files together into a single .CSV so that I could import the single .CSV and create area plots in Excel. That worked like a champ but there was a problem. The problem was that the hostnames of the virtual machines running on the frame are cryptic; they don't make any sense unless you have all of them memorized. This really frustrated me.

I had been creating the single .CSV and then adding an extra tab to the spreadsheet with the cryptic hostname and an simple description of the purpose of the VM so that my stacked area plots would be easier to read.

This worked like a champ, but the extra time to add the lookup information, add the =vlookup() function to the spreadsheet just got to be annoying. Once again, I decided to use perl to solve this problem for me.

First, the routine that I wrote to combine the individual .CSV files into a single .CSV. This routine uses a directory listing of the individual .CSV files that I want to stitch together. This is pretty simple to generate under Windows command line with the `dir *.csv /b > csv.list` command where csv.list is the list of .CSVs that I want to stitch together. All the individual .CSV files need to have the first column be a common date/time stamp, otherwise it won't work correctly. This isn't an issue since my tqharvc perl routine dumps the first column as a date/time stamp.

The code:

1:  use strict;  
2:    
3:  my $csvList = shift;  
4:  open(LIST, "$csvList") || die "$!";  
5:    
6:  my @csvArray = ();  
7:  my %matrixHash = ();  
8:    
9:  while (<LIST>) {  
10:   chomp($_);  
11:   push(@csvArray, $_);  
12:  };  
13:    
14:  foreach my $csvFile (@csvArray) {  
15:   open(CSV, "$csvFile") || die "$!";  
16:   print STDERR "Working on $csvFile\n";  
17:   while(<CSV>) {  
18:    if ($. > 1) {  
19:     chomp($_);  
20:     my @columns = split(/,/, $_);  
21:     $matrixHash{$columns[0]}{$csvFile} = $columns[1];  
22:    };  
23:   };  
24:   close(CSV);  
25:  };  
26:    
27:  # header line  
28:    
29:  print STDOUT "timeStamp";  
30:  foreach my $csvFile (@csvArray) {  
31:   my @columns = split(/_/, $csvFile);  
32:   print STDOUT ",$columns[0]";  
33:  };  
34:  print STDOUT "\n";  
35:    
36:  # actual data into CSV matrix  
37:    
38:  foreach my $timeStamp (sort(keys(%matrixHash))) {  
39:   print STDOUT "$timeStamp";  
40:   foreach my $csvFile (@csvArray) {  
41:    print STDOUT ",$matrixHash{$timeStamp}{$csvFile}"  
42:   };  
43:   print STDOUT "\n";  
44:  };  
45:    

The beauty of this code is that I can take tens, hundreds, even thousands of individual .CSV files and generate a single .CSV that can be imported into Excel for number crunching.


Next I needed the perl code to take the single .CSV file and generate a stacked area plot. For that I turned to the tried and true GD::Graph module, specifically GD::Graph::area. The perl code accepts three command line arguments: The path of the single .CSV, path to the text file that contains the hostnames and simple english description in CSV format and the name of the image to produce. Here is an example of the text file with hostnames and simple descriptions that I used in my example which I simply called "lpar.list:"
 vm1,Virtual Machine Number 1  
 vm2,Virtual Machine Number 2  
 vm3,Virtual Machine Number 3  
 vm4,Virtual Machine Number 4  
 vm5,Virtual Machine Number 5  
 vm6,Virtual Machine Number 6  
 vm7,Virtual Machine Number 7  
 vm8,Virtual Machine Number 8  
 vm9,Virtual Machine Number 9  
 vm10,Virtual Machine Number 10   
 vm11,Virtual Machine Number 11  
 vm12,Virtual Machine Number 12  
 vmvio_pri,Production VIO  
 vmvio_sec,Production VIO  
 vmvio_nonprod_pri,Non-Production VIO  
 vmvio_nonprod_sec,Non-Production VIO  
Fortunately, GD::Graph::area will automagically generate a stacked area plot if the proper option is set in the hash for the area plot on line #89 which is "cumulate => 1,"

And now the code that actually generates the area plot.
1:  use strict;  
2:  use GD::Graph::area;  
3:    
4:  my $csvData  = shift;  
5:  my $lookupData = shift;  
6:  my $graphName = shift;  
7:    
8:  $graphName .= ".gif";  
9:    
10:  open(DATA, "$csvData")   || die "$!";  
11:  open(LOOKUP, "$lookupData") || die "$!";  
12:    
13:  my %lookupHash = ();  
14:  my %areaHash  = ();  
15:  my @columnNames = ();  
16:    
17:  while(<LOOKUP>) {  
18:   chomp($_);  
19:   my ($lpar, $name) = split(/,/, $_);  
20:   $lookupHash{$lpar} = $name;  
21:  };  
22:    
23:  while(<DATA>) {  
24:   chomp($_);  
25:   my @columns = split(/,/, $_);  
26:   if ($. == 1) {  
27:    @columnNames = @columns;  
28:    foreach my $index (1..$#columns) {  
29:     # $columnNames[$index] = $lookupHash{$columnNames[$index]};  
30:     $columnNames[$index] .= " - $lookupHash{$columnNames[$index]}";  
31:    };  
32:   } else {  
33:    foreach my $i (1..$#columns) {  
34:     $areaHash{$columns[0]}{$columnNames[$i]} = $columns[$i];  
35:    };  
36:   };  
37:  };  
38:    
39:  my %timeStampsToDelete = ();  
40:    
41:  foreach my $timeStamp (keys(%areaHash)) {  
42:   foreach my $columnName (keys(%{$areaHash{$timeStamp}})) {  
43:    if ($areaHash{$timeStamp}{$columnName} < 0 || $areaHash{$timeStamp}{$columnName} eq '') {  
44:     $timeStampsToDelete{$timeStamp}++;  
45:    };  
46:   };  
47:  };  
48:    
49:  foreach my $timeStamp (sort(keys(%timeStampsToDelete))) {  
50:   print STDERR "Deleting $timeStamp from areaHash\n";  
51:   delete $areaHash{$timeStamp};  
52:  };  
53:    
54:  my @dataArray = ();  
55:  my @dataLine = ();  
56:    
57:  my $elementCount = 0;  
58:  my @legendArray = ();  
59:  my $i = 0;  
60:    
61:  foreach my $columnName (@columnNames) {  
62:   my @outputArray = ();  
63:   if ($columnName !~ /timeStamp/) {  
64:    my $j= 0;  
65:    foreach my $timeStamp (sort(keys(%areaHash))) {  
66:     $dataLine[$j++] = $areaHash{$timeStamp}{$columnName};  
67:    };  
68:    @outputArray = @dataLine;  
69:    push(@legendArray, $columnName);  
70:   } else {  
71:    foreach my $timeStamp (sort(keys(%areaHash))) {  
72:     push(@outputArray, $timeStamp);  
73:     $elementCount++;  
74:    };  
75:   };  
76:   $dataArray[$i] = \@outputArray;  
77:   $i++;  
78:  };  
79:     
80:  my $mygraph = GD::Graph::area->new(1280, 1024);  
81:    
82:  $mygraph->set(x_label_skip   => int($elementCount/40),  
83:         x_labels_vertical => 1,  
84:         y_label      => "physc",  
85:         y_min_value    => 0,  
86:         y_max_value    => 16,  
87:         y_tick_number   => 16,  
88:         title       => "Stacked physc utilization",  
89:         cumulate     => 1,  
90:  ) or warn $mygraph->error;  
91:    
92:  $mygraph->set_legend(@legendArray);  
93:    
94:  $mygraph->set_legend_font(GD::gdMediumBoldFont);  
95:  $mygraph->set_x_axis_font(GD::gdMediumBoldFont);  
96:  $mygraph->set_y_axis_font(GD::gdMediumBoldFont);  
97:    
98:  my $myimage = $mygraph->plot(\@dataArray) or die $mygraph->error;  
99:    
100:  open(PICTURE, ">$graphName");  
101:  binmode PICTURE;  
102:  print PICTURE $myimage->gif;  
103:  close(PICTURE);  
104:    

The output of the code becomes exactly what I wanted:

I don't believe that the area plot looks as nice as the Excel output, but now it is effortless to generate a stacked area plot of VM physc utilization as part of a daily cron job for reporting purposes.