Thursday, August 25, 2011

Low Power Consumption Server Gotcha

Something interesting happened this week. I've been fighting an issue where newer 12 and 16 core servers couldn't compete with older 8-core Windows 2003 servers. Turns out there were two gotchas.

The first one is fairly straight forward. A recent Microsoft security patch was installed that affected the way that .NET handles garbage collection. By default the .NET 2.0 CLR can handle up to 8 cores for garbage collection. Subsequent updates increased the number of cores that can handle garbage collection. But with the patch in question hobbled .NETs ability to properly handle garbage collection with more than 8 cores resulting in % time in GC shooting up and negatively effecting server capacity. The hot fix for our problem can be found here. The GC was being passed around to the (n-8) extra cores in the box.

We placed the hotfix in place and viola, the service demand measurements dropped by about 40%.

But still, the capacity of the 12/16 cores was still having issues compared to the original 8-core machines. We suspected that the .NET stack still needed to be further tuned but something interesting happened this week.

It was found that our servers were shipped from the factory in power saving mode. While under low core utilization the core clocks were running anywhere from 800 MHz to 1.1 GHz. It's not until the cores are pushed past 60% that the core clocks are increased up to the maximum of 2.4 GHz. Most of my capacity studies of production traffic were in the 40 to 50% CPU range. Not quite enough to force the core clocks to start jacking up the clock rate.

The result? It appeared that the capacity of the larger machines was less than the 8-core machines. After the core clock was set at the environmentally less friendly rate of 2.4 GHz the measured service demand under production traffic dropped a significant 56.5%. Not too shabby. However, it would have been nice to know about the server issue to begin with to eliminate a lot of confusion and wasted fundage.

I have been calculating capacity requirements based upon the low power consumption clock speed and a lot of extra boxes have been ordered. Now that we know about the power savings issue we can configure the boxes for maximum performance. And the extra boxes? That'll just add a bunch of extra capacity to the application.

Good times!