I Crush Servers: Discrete Event Simulation of the Three Tier eBiz with SimPy

In this post I blogged on how I wrote a simulation solution in PDQ-R for a case study used in the book, Performance By Design.

I also wanted to expand my horizons and write a Discrete Event Simulation solution to go along with the heuristic solution of PDQ-R. The firm that I was working for at the time when I came up with this solution had a DES package by the name of SimScript II but they weren't willing to cough up a license so that I could learn their product. So, I searched the web and found a python package by the name of SimPy that can be used for Discrete Event Simulation.

There are some major differences between the heuristic solution and the DES solution. With the heuristic solution I was able to use some linear algebra to determine how many many hits per second to send off to the various pages that consumed resources. With my SimPy solution I did not have that luxury as I am simulating individual hits to each page one at a time. To get a proper distribution of hits I came up with this simple routine to dispatch the hits to pages:

1:  class RandomPath:
2:   def RowSum(self, Vector):
3:    rowSum = 0.0
4:    for i in range(len(Vector)):
5:     rowSum += Vector[i]
6:    return rowSum
7:   def NextPage(self, T, i):
8:    rowSum = self.RowSum(T[i])
9:    randomValue = G.Rnd.uniform(0, rowSum)
10:    sumT = 0.0
11:    for j in range(len(T[i])):
12:     sumT += T[i][j]
13:     if randomValue < sumT:
14:      break
15:    return j

With this routine I take the sum of probability from a row of the TransitionMatrix hard coded in the routine and then generate a random value between zero and the sum of the probabilities. I then walk the row and summing the probabilities until I find a condition where the probablities are greater than the random value. The count when this happens determines the next page.

For example, below I have the TransitionMatrix as used in my python code:


88:   #                        0     1     2     3     4     5     6     7
89:   TransitionMatrix = [ [0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00], # 0
90:                        [0.00, 0.00, 0.70, 0.00, 0.10, 0.00, 0.00, 0.20], # 1
91:                        [0.00, 0.00, 0.45, 0.15, 0.10, 0.00, 0.00, 0.30], # 2
92:                        [0.00, 0.00, 0.00, 0.00, 0.40, 0.00, 0.00, 0.60], # 3
93:                        [0.00, 0.00, 0.00, 0.00, 0.00, 0.30, 0.55, 0.15], # 4
94:                        [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00], # 5
95:                        [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00], # 6
96:                        [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00] ]# 7

And we are currently on the search page, as represented by row #2 of the above stochastic matrix:


                               0     1     2     3     4     5     6     7
91:                        [0.00, 0.00, 0.45, 0.15, 0.10, 0.00, 0.00, 0.30], # 2

Column 0 is 0, my sum is 0.

Column 1 is 0, my sum is 0.

Column 2 is 0.45, my sum is 0.45. 0.45 is less than my random value, 0.67.

Column 3 is 0.15, my sum is 0.60. 0.60 is less than my random value, 0.67.

Column 4 is 0.10, my sum is 0.70. 0.70 is greater than my random value, 0.67. I pass back 4 to my routine and the next row in the random walk will be row 4 (which represents the login page).

What we end up is a Markov random walk of the transition matrix. Multiple comparisons of random walks versus the heuristic analysis was fairly close. Of course, with a random variable it is never the same but comes out fairly close with each iteration.

The great thing about this algorithm is that it can be used for both Discrete Event Simulation and in load testing as well. At an eCommerce site that I worked I had created a perl routine to slice and dice IIS logs and by using unique shopper ID in the logged cookies I had created a Markov transition matrix of actual customer traffic. The resulting matrix wasn't small. For our site, it was over 500 x 500 elements. That's quite a lot of different random paths to take.

The idea was to take this information and modify our LoadRunner HTTP Vusers to make use of this data so that our scripts would properly emulate actual customers as apposed to crude page summary analysis. With this mechanism, our load tests would properly represent customers and we would have been able to run our analysis once a week to always make sure that our scenarios represented how customers browsed the site.

Unfortunately, I never got to put my ideas into action and only got as far as creating the stochastic transition matrix and some crude perl code that demonstrated random walks that would have represented customers. All in all, this was the easy part. The real hard part of the project would have been coding the HTTP Vusers that would have properly handled state transition to page to page for non-happy routes that customers sometimes followed.

Below is the SimPy solution that I came up with for the case study presented:

1:  #!/usr/bin/env python
2:  from SimPy.Simulation import *
3:  from random import Random, expovariate, uniform
4:  class Metrics:
5:   metrics = dict()
6:   def Add(self, metricName, frameNumber, value):
7:    if self.metrics.has_key(metricName):
8:     if self.metrics[metricName].has_key(frameNumber):
9:      self.metrics[metricName][frameNumber].append(value)
10:     else:
11:      self.metrics[metricName][frameNumber] = list()
12:      self.metrics[metricName][frameNumber].append(value)
13:    else:
14:     self.metrics[metricName] = dict()
15:     self.metrics[metricName][frameNumber] = list()
16:     self.metrics[metricName][frameNumber].append(value)
17:   def Keys(self):
18:    return self.metrics.keys()
19:   def Mean(self, metricName):
20:    valueArray = list()
21:    if self.metrics.has_key(metricName):
22:     for frame in self.metrics[metricName].keys():
23:      for values in range(len(self.metrics[metricName][frame])):
24:       valueArray.append(self.metrics[metricName][frame][values])
25:     sum = 0.0
26:     for i in range(len(valueArray)):
27:      sum += valueArray[i]
28:     if len(self.metrics[metricName][frame]) != 0:
29:      return sum/len(self.metrics[metricName])
30:     else:
31:      return 0 # Need to learn python throwing exceptions
32:    else:
33:     return 0
34:  class RandomPath:
35:   def RowSum(self, Vector):
36:    rowSum = 0.0
37:    for i in range(len(Vector)):
38:     rowSum += Vector[i]
39:    return rowSum
40:   def NextPage(self, T, i):
41:    rowSum = self.RowSum(T[i])
42:    randomValue = G.Rnd.uniform(0, rowSum)
43:    sumT = 0.0
44:    for j in range(len(T[i])):
45:     sumT += T[i][j]
46:     if randomValue < sumT:
47:      break
48:    return j
49:  class G:
50:   numWS      = 1
51:   numAS      = 1
52:   numDS      = 2
53:   Rnd        = random.Random(12345)
54:   PageNames  = ["Entry", "Home", "Search", "View", "Login", "Create", "Bid", "Exit" ]
55:   Entry      = 0
56:   Home       = 1
57:   Search     = 2
58:   View       = 3
59:   Login      = 4
60:   Create     = 5
61:   Bid        = 6
62:   Exit       = 7
63:   WS         = 0
64:   AS         = 1
65:   DS         = 2
66:   CPU        = 0
67:   DISK       = 1
68:   WS_CPU     = 0
69:   WS_DISK    = 1
70:   AS_CPU     = 2
71:   AS_DISK    = 3
72:   DS_CPU     = 4
73:   DS_DISK    = 5
74:   metrics    = Metrics()
75:   #            e  h  s  v  l  c  b  e
76:   HitCount  = [0, 0, 0, 0, 0, 0, 0, 0]
77:   Resources = [[ Resource(1), Resource(1) ], # WS CPU and DISK
78:                [ Resource(1), Resource(1) ], # AS CPU and DISK
79:                [ Resource(1), Resource(1) ]] # DS CPU and DISK
80:   #         Enter Home  Search View  Login Create Bid  Exit
81:   ServiceDemand = [ [0.000, 0.008, 0.009, 0.011, 0.060, 0.012, 0.015, 0.000],  # WS_CPU
82:                     [0.000, 0.030, 0.010, 0.010, 0.010, 0.010, 0.010, 0.000],  # WS_DISK
83:                     [0.000, 0.000, 0.030, 0.035, 0.025, 0.045, 0.040, 0.000],  # AS_CPU
84:                     [0.000, 0.000, 0.008, 0.080, 0.009, 0.011, 0.012, 0.000],  # AS_DISK
85:                     [0.000, 0.000, 0.010, 0.009, 0.015, 0.070, 0.045, 0.000],  # DS_CPU
86:                     [0.000, 0.000, 0.035, 0.018, 0.050, 0.080, 0.090, 0.000] ] # DS_DISK
87:   # Type B shopper
88:   #                        0     1     2     3     4     5     6     7
89:   TransitionMatrix = [ [0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00], # 0
90:                        [0.00, 0.00, 0.70, 0.00, 0.10, 0.00, 0.00, 0.20], # 1
91:                        [0.00, 0.00, 0.45, 0.15, 0.10, 0.00, 0.00, 0.30], # 2
92:                        [0.00, 0.00, 0.00, 0.00, 0.40, 0.00, 0.00, 0.60], # 3
93:                        [0.00, 0.00, 0.00, 0.00, 0.00, 0.30, 0.55, 0.15], # 4
94:                        [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00], # 5
95:                        [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00], # 6
96:                        [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00] ] # 7
97:  class DoWork(Process):
98:   def __init__(self, i, resource, serviceDemand, nodeName, pageName):
99:    Process.__init__(self)
100:    self.frame         = i
101:    self.resource      = resource
102:    self.serviceDemand = serviceDemand
103:    self.nodeName      = nodeName
104:    self.pageName      = pageName
105:   def execute(self):
106:    StartUpTime = now()
107:    yield request, self, self.resource
108:    yield hold, self, self.serviceDemand
109:    yield release, self, self.resource
110:    R = now() - StartUpTime
111:    G.metrics.Add(self.pageName, self.frame, R)
112:  class CallPage(Process):
113:   def __init__(self, i, node, pageName):
114:    Process.__init__(self)
115:    self.frame        = i
116:    self.StartUpTime  = 0.0
117:    self.currentPage  = node
118:    self.pageName     = pageName
119:   def execute(self):
120:    if self.currentPage != G.Exit:
121:     print >> sys.stderr, "Working on Frame # ", self.frame, " @ ", now() , " for page ", self.pageName
122:     self.StartUpTime = now()
123:     if G.ServiceDemand[G.WS_CPU][self.currentPage] > 0.0:
124:      wsCPU = DoWork(self.frame, G.Resources[G.WS][G.CPU], G.ServiceDemand[G.WS_CPU][self.currentPage]/G.numWS, "wsCPU", self.pageName)
125:      activate(wsCPU, wsCPU.execute())
126:     if G.ServiceDemand[G.WS_DISK][self.currentPage] > 0.0:
127:      wsDISK = DoWork(self.frame, G.Resources[G.WS][G.DISK], G.ServiceDemand[G.WS_DISK][self.currentPage]/G.numWS, "wsDISK", self.pageName)
128:      activate(wsDISK, wsDISK.execute())
129:     if G.ServiceDemand[G.AS_CPU][self.currentPage] > 0.0:
130:      asCPU = DoWork(self.frame, G.Resources[G.AS][G.CPU], G.ServiceDemand[G.AS_CPU][self.currentPage]/G.numAS, "asCPU", self.pageName)
131:      activate(asCPU, asCPU.execute())
132:     if G.ServiceDemand[G.AS_DISK][self.currentPage] > 0.0:
133:      asDISK = DoWork(self.frame, G.Resources[G.AS][G.DISK], G.ServiceDemand[G.AS_DISK][self.currentPage]/G.numAS, "asDISK", self.pageName)
134:      activate(asDISK, asDISK.execute())
135:     if G.ServiceDemand[G.DS_CPU][self.currentPage] > 0.0:
136:      dsCPU = DoWork(self.frame, G.Resources[G.DS][G.CPU], G.ServiceDemand[G.DS_CPU][self.currentPage]/G.numDS, "dsCPU", self.pageName)
137:      activate(dsCPU, dsCPU.execute())
138:     if G.ServiceDemand[G.DS_DISK][self.currentPage] > 0.0:
139:      dsDISK = DoWork(self.frame, G.Resources[G.DS][G.DISK], G.ServiceDemand[G.DS_DISK][self.currentPage]/G.numDS, "dsDISK", self.pageName)
140:      activate(dsDISK, dsDISK.execute())
141:     G.HitCount[self.currentPage] += 1
142:     yield hold, self, 0.00001 # Needed to prevent an error. Doesn't add any blocking to the six queues above
143:  class Generator(Process):
144:   def __init__(self, rate, maxT, maxN):
145:     Process.__init__(self)
146:     self.name = "Generator"
147:     self.rate = rate
148:     self.maxN = maxN
149:     self.maxT = maxT
150:     self.g    = Random(11335577)
151:     self.i    = 0
152:     self.currentPage = G.Home
153:   def execute(self):
154:    while (now() < self.maxT):
155:     self.i += 1
156:     p = CallPage(self.i,self.currentPage,G.PageNames[self.currentPage])
157:     activate(p,p.execute())
158:     yield hold,self,self.g.expovariate(self.rate)
159:     randomPath = RandomPath()
160:     if self.currentPage == G.Exit:
161:      self.currentPage = G.Home
162:     else:
163:      self.currentPage = randomPath.NextPage(G.TransitionMatrix, self.currentPage)
164:  def main():
165:    maxWorkLoad = 10000
166:    Lambda      = 4.026*float(sys.argv[1])
167:    maxSimTime  = float(sys.argv[2])
168:    initialize()
169:    g = Generator(Lambda, maxSimTime, maxWorkLoad)
170:    activate(g,g.execute())
171:    simulate(until=maxSimTime)
172:    print >> sys.stderr, "Simulated Seconds  : ", maxSimTime
173:    print >> sys.stderr, "Page Hits      :"
174:    for i in range(len(G.PageNames)):
175:     print >> sys.stderr, "\t", G.PageNames[i], " = ", G.HitCount[i]
176:    print >> sys.stderr, "Throughput      : "
177:    for i in range(len(G.PageNames)):
178:     print >> sys.stderr, "\t", G.PageNames[i], " = ", G.HitCount[i]/maxSimTime
179:    print >> sys.stderr, "Mean Response Times:"
180:    for i in G.metrics.Keys():
181:     print >> sys.stderr, "\t", i, " = ", G.metrics.Mean(i)
182:    print G.HitCount[G.Home]/maxSimTime, ",", G.metrics.Mean("Home"), ",", G.metrics.Mean("View"), ",", G.metrics.Mean("Search"), ",", G.metrics.Mean("Login"), ",", G.metrics.Mean("Create"), ",", G.metrics.Mean("Bid")
183:  if __name__ == '__main__': main()

In a future blog post I will compare the results of the Discrete Event Simulation versus my PDQ-R solution and finally compare both of those against a work up done with TeamQuest Model just for good measure.

I Crush Servers

Sunday, November 20, 2011

Discrete Event Simulation of the Three Tier eBiz with SimPy

No comments:

Post a Comment