Tuesday, November 15, 2011

Using PDQ-R to model a three tier eBiz Application

This post is a continuation of this post that I had done earlier.

Continuing the solution for Case Study IV: An E-Business Service from the book, Performance by Design: Computer Capacity Planning By Example.

In the previous post I discuss how to take the transition probability matrix and work backwards toward the original series of linear equations for solving the number of visits to a series of web pages. In this case study there are actually two types of visitors that result in two transition probability matrices that must be utilized. There are 25% of Type A visitors and 75% of Type B visitors.

Each tier of the hypothetical e-biz service is made up by a single CPU and a single disk drive. A matrix is supplied with the total service demand for each component by each page that is hit by visitors.

While it is simple to write some code to analyze web logs to generate the transition probability matrix based upon customer traffic it is very difficult to isolate the total demand at each component with chaotic customer traffic. But that is why we have load testing tools that are available to us. In a pseudo-production environment we are capable of simulating customer traffic to one page at a time and calculating the total demand for components. In this particular case only the CPU and disk drives are being modeled but for a real service we'd want to model the CPU, disk drives, memory system, network system, etc.

After running simulated customer traffic against isolated page hits we could generate a similar demand matrix for components and use it for what-if analysis.

For this solution I had opted to use PDQ-R with the R programming language.

My R code for the solution:

 # Solution parameters   
gamma <- 10.96; # Rate into system
numWS <- 1; # Number of Web Servers
numAS <- 1; # Number of Application Servers
numDS <- 1; # Number of Database Servers
# external library
library("pdq");
# Constants #
E <- 1;
H <- 2;
S <- 3;
V <- 4;
G <- 5;
C <- 6;
B <- 7;
X <- 8;
PAGE_NAMES <- c("Enter", "HomePage", "Search", "ViewBids", "Login", "CreateAuction", "PlaceBid", "Exit");
COMPONENTS <- c("CPU", "Disk");
SERVER_TYPES <- c("WS", "AS", "DS");
WS_CPU <- 1;
WS_DISK <- 2;
AS_CPU <- 3;
AS_DISK <- 4;
DS_CPU <- 5;
DS_DISK <- 6;
# Functions used in solution
VisitsByTransitionMatrix <- function(M, B) {
A <- t(M);
A <- -1 * A;
for (i in 1:sqrt(length(A))) {
j <- i;
A[i,j] <- A[i,j] + 1;
};
return(solve(A,B));
};
CalculateLambda <- function(gamma, f_a, f_b, V_a, V_b, index) {
return (
gamma*((f_a*V_a[index]) + (f_b*V_b[index]))
);
};
f_a <- 0.25; # Fraction of TypeA users
f_b <- 1 - f_a; # Fraction of TypeB users
lambda <- 1:X; # Array of lambda for each page
SystemInput <- matrix(c(1,0,0,0,0,0,0,0),nrow=8,ncol=1) # 8.3, Figure 8.2, page 208
TypeA <- matrix(c(0,1,0,0,0,0,0,0,0,0,0.7,0,0.1,0,0,
0.2,0,0,0.4,0.2,0.15,0,0,0.25,0,0,
0,0,0.65,0,0,0.35,0,0,0,0,0,0.3,0.6,
0.1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,0), ncol=8, nrow=8, byrow=TRUE); # 8.4, Table 8.1, page 209
TypeB <- matrix(c(0,1,0,0,0,0,0,0,0,0,0.7,0,0.1,0,0,
0.2,0,0,0.45,0.15,0.1,0,0,0.3,0,0,
0,0,0.4,0,0,0.6,0,0,0,0,0,0.3,0.55,
0.15,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,
1,0,0,0,0,0,0,0,0), nrow=8, ncol=8, byrow=TRUE); # 8.4, Table 8.2, page 210
DemandTable <- matrix(c(0,0.008,0.009,0.011,0.06,0.012,0.015,
0,0,0.03,0.01,0.01,0.01,0.01,0.01,0,
0,0,0.03,0.035,0.025,0.045,0.04,0,0,
0,0.008,0.08,0.009,0.011,0.012,0,0,
0,0.01,0.009,0.015,0.07,0.045,0,0,0,
0.035,0.018,0.05,0.08,0.09,0), ncol=8, nrow=6, byrow=TRUE); # 8.4, Table 8.4, page 212 (with modifications)
VisitsA <- VisitsByTransitionMatrix(TypeA, SystemInput);
VisitsB <- VisitsByTransitionMatrix(TypeB, SystemInput);
lambda[E] <- 0; # Not used in calculations
lambda[H] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, H);
lambda[S] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, S);
lambda[V] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, V);
lambda[G] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, G);
lambda[C] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, C);
lambda[B] <- CalculateLambda(gamma, f_a, f_b, VisitsA, VisitsB, B);
lambda[X] <- 0 # Not used in calculations
Init("e_biz_service");
# Define workstreams
for (n in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[n]);
CreateOpen(workStreamName, lambda[n]);
};
# Define Web Server Queues
for (i in 1:numWS) {
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("WS_%d_%s", i, COMPONENTS[j]);
CreateNode(nodeName, CEN, FCFS);
};
};
# Define Application Server Queues
for (i in 1:numAS) {
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("AS_%d_%s", i, COMPONENTS[j]);
CreateNode(nodeName, CEN, FCFS);
};
};
# Define Database Server Queues
for (i in 1:numDS) {
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("DS_%d_%s", i, COMPONENTS[j]);
CreateNode(nodeName, CEN, FCFS);
};
};
# Set Demand for the Web Servers
for (i in 1:numWS) {
demandIndex <- WS_CPU;
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("WS_%d_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
SetDemand(nodeName, workStreamName, (DemandTable[demandIndex + (j-1), k])/numWS);
};
};
};
# Set Demand for the App Servers
for (i in 1:numAS) {
demandIndex <- AS_CPU;
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("AS_%d_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
SetDemand(nodeName, workStreamName, (DemandTable[demandIndex + (j-1), k])/numAS);
};
};
};
# Set Demand for the Database Servers
for (i in 1:numDS) {
demandIndex <- DS_CPU;
for (j in 1:length(COMPONENTS)) {
nodeName <- sprintf("DS_%d_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
SetDemand(nodeName, workStreamName, (DemandTable[demandIndex + (j-1), k])/numDS);
};
};
};
SetWUnit("Trans");
SetTUnit("Second");
Solve(CANON);
print("Arrival Rates for each page:");
for (i in H:B) {
print(sprintf("%s = %f", PAGE_NAMES[i], lambda[i]));
};
print("[-------------------------------------------------]");
print("Page Response Times");
for (i in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[i]);
print(sprintf("%s = %f seconds.", PAGE_NAMES[i], GetResponse(TRANS, workStreamName)));
};
print("[-------------------------------------------------]");
print("Component Utilizations");
for (i in 1:numWS) {
for (j in 1:length(COMPONENTS)) {
totalUtilization <- 0;
nodeName <- sprintf("WS_%s_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
totalUtilization <- totalUtilization + GetUtilization(nodeName, workStreamName, TRANS);
};
print(sprintf("%s = %3.2f %%", nodeName, totalUtilization * 100));
};
};
for (i in 1:numAS) {
for (j in 1:length(COMPONENTS)) {
totalUtilization <- 0;
nodeName <- sprintf("AS_%s_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
totalUtilization <- totalUtilization + GetUtilization(nodeName, workStreamName, TRANS);
};
print(sprintf("%s = %3.2f %%", nodeName, totalUtilization * 100));
};
};
for (i in 1:numDS) {
for (j in 1:length(COMPONENTS)) {
totalUtilization <- 0;
nodeName <- sprintf("DS_%s_%s", i, COMPONENTS[j]);
for (k in H:B) {
workStreamName <- sprintf("%s", PAGE_NAMES[k]);
totalUtilization <- totalUtilization + GetUtilization(nodeName, workStreamName, TRANS);
};
print(sprintf("%s = %3.2f %%", nodeName, totalUtilization * 100));
};
};


Here is some sample output from the R code:

 Here is a bit of sample output with 10.96 users entering the system per second:  
[1] "Arrival Rates for each page:"
[1] "HomePage = 10.960000"
[1] "Search = 13.658485"
[1] "ViewBids = 2.208606"
[1] "Login = 3.664958"
[1] "CreateAuction = 1.099487"
[1] "PlaceBid = 2.074180"
[1] "[-------------------------------------------------]"
[1] "Page Response Times"
[1] "HomePage = 0.083517 seconds."
[1] "Search = 1.612366 seconds."
[1] "ViewBids = 1.044683 seconds."
[1] "Login = 2.323417 seconds."
[1] "CreateAuction = 3.622690 seconds."
[1] "PlaceBid = 3.983755 seconds."
[1] "[-------------------------------------------------]"
[1] "Component Utilizations"
[1] "WS_1_CPU = 49.91 %"
[1] "WS_1_Disk = 55.59 %"
[1] "AS_1_CPU = 71.11 %"
[1] "AS_1_Disk = 35.59 %"
[1] "DS_1_CPU = 38.17 %"
[1] "DS_1_Disk = 97.57 %"


In this analysis we can see that the database disk IO is approaching 100%. A simple solution (for this analysis at least) is to add another database server to spread the load evenly.

I modify the line that reads "numDS <- 1; # Number of Database Servers" to read "numDS <- 2; # Number of Database Servers" and re-run the analysis:

 [1] "Arrival Rates for each page:"  
[1] "HomePage = 10.960000"
[1] "Search = 13.658485"
[1] "ViewBids = 2.208606"
[1] "Login = 3.664958"
[1] "CreateAuction = 1.099487"
[1] "PlaceBid = 2.074180"
[1] "[-------------------------------------------------]"
[1] "Page Response Times"
[1] "HomePage = 0.083517 seconds."
[1] "Search = 0.237452 seconds."
[1] "ViewBids = 0.336113 seconds."
[1] "Login = 0.358981 seconds."
[1] "CreateAuction = 0.462042 seconds."
[1] "PlaceBid = 0.440903 seconds."
[1] "[-------------------------------------------------]"
[1] "Component Utilizations"
[1] "WS_1_CPU = 49.91 %"
[1] "WS_1_Disk = 55.59 %"
[1] "AS_1_CPU = 71.11 %"
[1] "AS_1_Disk = 35.59 %"
[1] "DS_1_CPU = 19.09 %"
[1] "DS_1_Disk = 48.78 %"
[1] "DS_2_CPU = 19.09 %"
[1] "DS_2_Disk = 48.78 %"


Not only do we alleviate the database server disk IO utilization, we have an appreciable decrease in page response time for the end user. The Search page went from 1.61 seconds to 0.24 seconds. Not too shabby. The biggest difference was with the Create Auction page which wnet from 3.62 seconds to 0.46 seconds for a change of over three seconds.

In a future post I will go over modified PDQ-R solution that also generates graphs to show changes in resource utilization and page response times with increasing load. I have have a Discrete Event Solution to the aformentioned eBiz application that was done with Python and the SimPy library to compare output of heuristic analysis and event simulation.

No comments:

Post a Comment