Sunday, April 29, 2012

Creating stacked area plots with GD::Graph::area in Perl

I recently came across a need to generate an area plot to show the utilization of a series of virtual machines on an IBM pSeries frame. I had already written this perl utility working with TeamQuest's tqharvc.exe to extract TeamQuest physc data. The routine dumps out both .CSVs and scatter plots for requested metrics that are specified by a saved TeamQuest View .rpt file created with TeamQuest View.

I decided to write another perl utility to stitch together indiviual .CSV files together into a single .CSV so that I could import the single .CSV and create area plots in Excel. That worked like a champ but there was a problem. The problem was that the hostnames of the virtual machines running on the frame are cryptic; they don't make any sense unless you have all of them memorized. This really frustrated me.

I had been creating the single .CSV and then adding an extra tab to the spreadsheet with the cryptic hostname and an simple description of the purpose of the VM so that my stacked area plots would be easier to read.

This worked like a champ, but the extra time to add the lookup information, add the =vlookup() function to the spreadsheet just got to be annoying. Once again, I decided to use perl to solve this problem for me.

First, the routine that I wrote to combine the individual .CSV files into a single .CSV. This routine uses a directory listing of the individual .CSV files that I want to stitch together. This is pretty simple to generate under Windows command line with the `dir *.csv /b > csv.list` command where csv.list is the list of .CSVs that I want to stitch together. All the individual .CSV files need to have the first column be a common date/time stamp, otherwise it won't work correctly. This isn't an issue since my tqharvc perl routine dumps the first column as a date/time stamp.

The code:

1:  use strict;  
2:    
3:  my $csvList = shift;  
4:  open(LIST, "$csvList") || die "$!";  
5:    
6:  my @csvArray = ();  
7:  my %matrixHash = ();  
8:    
9:  while (<LIST>) {  
10:   chomp($_);  
11:   push(@csvArray, $_);  
12:  };  
13:    
14:  foreach my $csvFile (@csvArray) {  
15:   open(CSV, "$csvFile") || die "$!";  
16:   print STDERR "Working on $csvFile\n";  
17:   while(<CSV>) {  
18:    if ($. > 1) {  
19:     chomp($_);  
20:     my @columns = split(/,/, $_);  
21:     $matrixHash{$columns[0]}{$csvFile} = $columns[1];  
22:    };  
23:   };  
24:   close(CSV);  
25:  };  
26:    
27:  # header line  
28:    
29:  print STDOUT "timeStamp";  
30:  foreach my $csvFile (@csvArray) {  
31:   my @columns = split(/_/, $csvFile);  
32:   print STDOUT ",$columns[0]";  
33:  };  
34:  print STDOUT "\n";  
35:    
36:  # actual data into CSV matrix  
37:    
38:  foreach my $timeStamp (sort(keys(%matrixHash))) {  
39:   print STDOUT "$timeStamp";  
40:   foreach my $csvFile (@csvArray) {  
41:    print STDOUT ",$matrixHash{$timeStamp}{$csvFile}"  
42:   };  
43:   print STDOUT "\n";  
44:  };  
45:    

The beauty of this code is that I can take tens, hundreds, even thousands of individual .CSV files and generate a single .CSV that can be imported into Excel for number crunching.


Next I needed the perl code to take the single .CSV file and generate a stacked area plot. For that I turned to the tried and true GD::Graph module, specifically GD::Graph::area. The perl code accepts three command line arguments: The path of the single .CSV, path to the text file that contains the hostnames and simple english description in CSV format and the name of the image to produce. Here is an example of the text file with hostnames and simple descriptions that I used in my example which I simply called "lpar.list:"
 vm1,Virtual Machine Number 1  
 vm2,Virtual Machine Number 2  
 vm3,Virtual Machine Number 3  
 vm4,Virtual Machine Number 4  
 vm5,Virtual Machine Number 5  
 vm6,Virtual Machine Number 6  
 vm7,Virtual Machine Number 7  
 vm8,Virtual Machine Number 8  
 vm9,Virtual Machine Number 9  
 vm10,Virtual Machine Number 10   
 vm11,Virtual Machine Number 11  
 vm12,Virtual Machine Number 12  
 vmvio_pri,Production VIO  
 vmvio_sec,Production VIO  
 vmvio_nonprod_pri,Non-Production VIO  
 vmvio_nonprod_sec,Non-Production VIO  
Fortunately, GD::Graph::area will automagically generate a stacked area plot if the proper option is set in the hash for the area plot on line #89 which is "cumulate => 1,"

And now the code that actually generates the area plot.
1:  use strict;  
2:  use GD::Graph::area;  
3:    
4:  my $csvData  = shift;  
5:  my $lookupData = shift;  
6:  my $graphName = shift;  
7:    
8:  $graphName .= ".gif";  
9:    
10:  open(DATA, "$csvData")   || die "$!";  
11:  open(LOOKUP, "$lookupData") || die "$!";  
12:    
13:  my %lookupHash = ();  
14:  my %areaHash  = ();  
15:  my @columnNames = ();  
16:    
17:  while(<LOOKUP>) {  
18:   chomp($_);  
19:   my ($lpar, $name) = split(/,/, $_);  
20:   $lookupHash{$lpar} = $name;  
21:  };  
22:    
23:  while(<DATA>) {  
24:   chomp($_);  
25:   my @columns = split(/,/, $_);  
26:   if ($. == 1) {  
27:    @columnNames = @columns;  
28:    foreach my $index (1..$#columns) {  
29:     # $columnNames[$index] = $lookupHash{$columnNames[$index]};  
30:     $columnNames[$index] .= " - $lookupHash{$columnNames[$index]}";  
31:    };  
32:   } else {  
33:    foreach my $i (1..$#columns) {  
34:     $areaHash{$columns[0]}{$columnNames[$i]} = $columns[$i];  
35:    };  
36:   };  
37:  };  
38:    
39:  my %timeStampsToDelete = ();  
40:    
41:  foreach my $timeStamp (keys(%areaHash)) {  
42:   foreach my $columnName (keys(%{$areaHash{$timeStamp}})) {  
43:    if ($areaHash{$timeStamp}{$columnName} < 0 || $areaHash{$timeStamp}{$columnName} eq '') {  
44:     $timeStampsToDelete{$timeStamp}++;  
45:    };  
46:   };  
47:  };  
48:    
49:  foreach my $timeStamp (sort(keys(%timeStampsToDelete))) {  
50:   print STDERR "Deleting $timeStamp from areaHash\n";  
51:   delete $areaHash{$timeStamp};  
52:  };  
53:    
54:  my @dataArray = ();  
55:  my @dataLine = ();  
56:    
57:  my $elementCount = 0;  
58:  my @legendArray = ();  
59:  my $i = 0;  
60:    
61:  foreach my $columnName (@columnNames) {  
62:   my @outputArray = ();  
63:   if ($columnName !~ /timeStamp/) {  
64:    my $j= 0;  
65:    foreach my $timeStamp (sort(keys(%areaHash))) {  
66:     $dataLine[$j++] = $areaHash{$timeStamp}{$columnName};  
67:    };  
68:    @outputArray = @dataLine;  
69:    push(@legendArray, $columnName);  
70:   } else {  
71:    foreach my $timeStamp (sort(keys(%areaHash))) {  
72:     push(@outputArray, $timeStamp);  
73:     $elementCount++;  
74:    };  
75:   };  
76:   $dataArray[$i] = \@outputArray;  
77:   $i++;  
78:  };  
79:     
80:  my $mygraph = GD::Graph::area->new(1280, 1024);  
81:    
82:  $mygraph->set(x_label_skip   => int($elementCount/40),  
83:         x_labels_vertical => 1,  
84:         y_label      => "physc",  
85:         y_min_value    => 0,  
86:         y_max_value    => 16,  
87:         y_tick_number   => 16,  
88:         title       => "Stacked physc utilization",  
89:         cumulate     => 1,  
90:  ) or warn $mygraph->error;  
91:    
92:  $mygraph->set_legend(@legendArray);  
93:    
94:  $mygraph->set_legend_font(GD::gdMediumBoldFont);  
95:  $mygraph->set_x_axis_font(GD::gdMediumBoldFont);  
96:  $mygraph->set_y_axis_font(GD::gdMediumBoldFont);  
97:    
98:  my $myimage = $mygraph->plot(\@dataArray) or die $mygraph->error;  
99:    
100:  open(PICTURE, ">$graphName");  
101:  binmode PICTURE;  
102:  print PICTURE $myimage->gif;  
103:  close(PICTURE);  
104:    

The output of the code becomes exactly what I wanted:

I don't believe that the area plot looks as nice as the Excel output, but now it is effortless to generate a stacked area plot of VM physc utilization as part of a daily cron job for reporting purposes.

Saturday, January 21, 2012

Extracting data from CA Wily APM via web service API


Found an interesting gotcha with the CA Wily API web services that bit me in the buttocks this week and thought anybody else that might want to access the CA Wily APM via web services might like to know about.

I wrote some sample web service client code in python using the suds library to pull a list of agents against some LPARs and then list out the available metrics for the agents.

8<------------
from suds.client import Client


userName = 'someAccount'
passWord = 'supersupersekrit'


serverName = 'someserver'
serverPort = 1234


metricsDataServiceURL = 'http://%s:%d/introscope-web-services/services/…' % (serverName, serverPort)
agentListURL = 'http://%s:%d/introscope-web-services/services/…' % (serverName, serverPort)


agentListClient = Client(agentListURL, username=userName, password=passWord)
agentList = agentListClient.service.listAgents('.*servername.*')


for agentName in agentList:
  print "Working on Agent %s" % (agentName,)
  metricList = agentListClient.service.listMetrics(agentName, '.*')
  for metricName in metricList:
    print "\t%s" % (metricName,)
------------->8

What I failed to take into account is that the list of agents that is returned from the system have special regex meta characters and when passed back with the call to the listMetrics and these meta characters need to be escaped with a back-whack.

I was able to fix the problem by importing the re regular expression library and using a call to re.escape with the agent name by changing the line

8<--------
metricList = agentListClient.service.listMetrics(agentName, '.*')
--------->8

to

8<--------
metricList = agentListClient.service.listMetrics(re.escape(agentName), '.*')
--------->8

After that, I was able to retrieve a list of all the available metrics for the captured Agents. Any of the API calls that require a resolved Agent name from a regex just need to be escaped and all is well.