Sunday, May 29, 2011

Converting a Probablity Matrix into a set of Linear Equations in R

This entry was from another blog entry of mine but I have migrated it over to this blog as I find that it is relevant to something that I am working on for eCommerce capacity analysis where probability matrices might work out very nicely.

The case study that I have started working on is in Chapter 8: Case Study IV: An E-Business Service.

The author provides us with a Customer Behavior Model Graph of a website:



From this graph we can create a set of linear equations that describes the number of visits to each page of the system presented in the CBMG (seen above). Taking a closer look at the graph we can see that there are probabilities of transitioning between one page to another. For example, the probability of moving from the Home Page (h) to the Login (g) is Phg. The probability of moving from the Login (g) page to the Search (s) page is Psg.

Using these probabilities we can create a set of linear equations to describe the number of visits to each page in the system based upon the number of visits into the entry of the system. In this case, there is only one way into the system and that is via the Entry (e) page.

We can look at the graph and manually derive the linear equations. For example, we can easily see that Vh = Peh*Ve. We can also see that Vg = Phg*Vh + Psg*Vs + Pvg*Vv. Doing this we eventually come up with eight linear equations that we can then solve. (The equations are listed on page 208 of Performance by Design: Computer Capacity Planning By Example).

The authors of the book provide us with all the probabilities in the form of a Matrix of Transition Probabilities on page 209. Here is that matrix:


                       (e)     (h)   (s)   (v)   (g)   (c)   (b)   (x)

Entry (e) 0 1 0 0 0 0 0 0
Home (h) 0 0 0.7 0 0.1 0 0 0.2
Search (s) 0 0 0.4 0.2 0.15 0 0 0.25
View Bids (v) 0 0 0 0 0.65 0 0 0.35
Login (g) 0 0 0 0 0 0.3 0.6 0.1
Create Auction (c) 0 0 0 0 0 0 0 1
Place Bid (b) 0 0 0 0 0 0 0 1
Exit (x) 0 0 0 0 0 0 0 0


So we can take those values found in the transition matrix and eventually gonkulate all the visits to each page. Although the transition matrix is provided by the author it is easy enough to re-create this matrix from web logs for a site.

There's bound to be a way of backing the transition matrix into the set of linear equations that can easily be solved in R (or any other method that can solve linear equations).

This is how I backed out the equations by hand. I know that the transition matrix has from "from" pages down the rows and the "to" along the columns. So for example, if I wanted to figure out the equation for the Vs I would work down the column and use the coefficients of the transition matrix as multipliers for the V variables.

For example, with the 3rd column with solving for Vs:

The third column of the transition matrix is:


   (s)

(e) 0
(h) 0.7
(s) 0.4
(v) 0
(g) 0
(c) 0
(b) 0
(x) 0


So to solve for Vs:

Vs = 0*Ve + 0.7*Vh + 0.4Vs + 0*Vv + 0*Vg + 0*Vc + 0*Vb + 0*Vx

Notice that we have a Vs on both sides of the equal sign. This corresponds to the loop on the Search node.

We can drop out the variables that are multiplied by zero.

Vs = 0.7*Vh + 0.4Vs

Repeat the steps for all the columns of the transition matrix and you end up with the equations to solve for. But, most packages want the equations setup in matrix form. So, let's get all the variables on the left side of the equal sign.

Vs - 0.4Vs - 0.7*Vh

Simplifies into:

0.6Vs - 0.7*Vh

If we do this with all the equations we can finally put it into matrix form. But we can't solve it yet. We need another vector to solve with. This vector will contain the input into the system. In this case the entry into the system is into the Entry page so the vertical vector is:

(e) 1
(h) 0
(s) 0
(v) 0
(g) 0
(c) 0
(b) 0
(x) 0

If we call our first matrix that we created from the transition matrix A and our input vertical vector B, we have the classic A*B = 0 that we can solve for.

Now we get into the R solution for the above. As I was working out the algorithm to setup the problem for calling the R solve() routine it hit me how simple the solution was!

Let's call our transition matrix supplied by the authors as M.

We can take transpose of M and let's call it A.

Take A and multiply by the scalar value of -1. This is the act of moving all the variables to the left side of the equals sign.

Now we combine variables we are solving for in each row of A by adding 1 to the trace of matrix A.

And there we have our final form of A. We can now solve for A*B = 0!

Here is my solution in R
# Define the column information for easy access by name


e <- 1;
h <- 2;
s <- 3;
v <- 4;
g <- 5;
c <- 6;
b <- 7;
x <- 8;

# M will be the supplied transition matrix

M <- matrix(c(0,1,0,0,0,0,0,0,0,0,0.7,0,0.1,0,0,0.2,0,0,0.4,0.2,0.15,0,0,0.25,0,0,0,0,0.65,0,0,0.35,0,0,0,0,0,0.3,0.6,0.1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0), ncol=8, nrow=8, byrow=TRUE);

# B is the vector that describes entry into the system
B <- matrix(c(1,0,0,0,0,0,0,0),nrow=8,ncol=1);

VisitsByTransitionMatrix <- function(M, B) {
A <- t(M);
A <- -1 * A;
for (i in 1:sqrt(length(A))) {
j <- i;
A[i,j] <- A[i,j] + 1;
};
return(solve(A,B));
};

Visits <- VisitsByTransitionMatrix(M, B);

> Visits
[,1]
[1,] 1.0000000
[2,] 1.0000000
[3,] 1.1666667
[4,] 0.2333333
[5,] 0.4266667
[6,] 0.1280000
[7,] 0.2560000
[8,] 1.0000000

> Visits[e]
[1] 1
> Visits[h]
[1] 1
> Visits[s]
[1] 1.166667
> Visits[v]
[1] 0.2333333
> Visits[g]
[1] 0.4266667
> Visits[c]
[1] 0.128
> Visits[b]
[1] 0.256
> Visits[x]
[1] 1