Wednesday, June 06, 2007

testing the Sun X4600M2 and ESX Server 3.0, part I

Outside of a few obstacles, we had a useful and interesting session testing a Sun X4600M2. The plan was to use three virtual machines on ESX Server 3.0 to simulate our eCommerce infrastructure:
- one Win2K3, IIS 6.0 web server running our website application
- one RHEL 3.0 AS running Oracle 10G
- one Win2K server running MS Web Stress Application Tool (MS WSAT) to generate HTTP traffic load against the Win2K3 web server

The X4600M2 we tested was an eight, dual-core 2.4Ghz Opteron RevE cpus connected to a Sun 5310 fiber storage array. The 4600 ran VMware ESX 3.0 server on top of a customized version of Linux built for VMware. I provided the vendor with the three preconfigured virtual machines. The vms were zipped on dual layer DVDs and took a while to copy and unzip, roughly an hour each. Also, the virtual machines were built on an Intel box and as such, needed to be converted specifically to the AMD Opteron architecture of the ESX server (the 4600). This was news to us and took about twenty minutes to convert the 8GB Windows vms and about an hour to convert the 33GB database file.

We started all three vms, did some Windows configuration and verified connectivity between the servers. TNSnames and an ODBC driver needed setup on the web server. The first large hurdle we encountered was that unlike our test system, the RHEL3 vm was not able to find its IP address via DHCP. After trying a few things, we assigned the address statically and the server then became available on the network. Once all three boxes were talking, we then verified that the website could pull data from the database. We did this; however, we saw that the database sequences were not created when we added an item to our cart. I got on the phone with our programmers and after about 45 minutes, resolved the problem using a public synonym. After this problem was solved, we spent a half an hour using the WSAT's recorder function to navigate the website and create the test cases. We were then able to start testing.

As our vendor did not have an Enterprise license for the ESX Server installation, we were limited to assigning up to four cpus per vm. So we assigned each vm the maximum available:
- Oracle vm: four cpus
- IIS vm: four cpus
- MS WSAT vm: four cpus

Since one CPU on the 4600 is dedicated to VMware overhead, this left three CPUs unused in the 4600.

We used MS WSAT to apply load to the Web server instance, slowly increasing load from one session to ten to one hundred virtual users in order to verify that:
1) the stress tool was working correctly,
2) the website was responding appropriately, and
3) we could see data via the VMware Virtual Infrastructure Client management app

We verified that these conditions were met.

It was interesting to view the VMware instrumentation. The VMware Infrastructure management app is a lot like Performance Monitor in Windows. You can view CPU/disk/memory and network stats. We toggled between the three vms and checked out performance stats for each. The most stressed vm was the IIS webserver, as it was serving data to the testing client (the Win2K server running MS WSAT), as well as pulling content from the database.

One interesting metric we saw in the management interface was called Megahertz Used, which is basically the percent of the total megahertz available to a vm. For example, if a vm has one 2.4Ghz cpu dedicated to it and that cpu is 10% busy, you're using 240Mhz of the available CPU power. On our Win2K3 web server vm, we had four cpus available at 2.4Ghz each. This gave us a total of about 10,000 megahertz available to the vm. When we increased the load to the Win2K3 web server, we saw that the webserver was using about 80-90% total CPU available or about 8,800Mhz of CPU. This load was more or less equally divided by the four CPUs assigned to the VM:
cpu0: 2300 mhz used
cpu1: 2200 mhz used
cpu2: 2200 mhz used
cpu3: 2100 mhz used

Utilizing the megahertz available to a vm, VMware is able to balances load to cpus within a vm as well as balance load between vms. ESX server 3.0 can dynamically provision new vms by analyzing this statistic.

Another interesting thing we did was to clone our testing server, the Win2K server with MS WSAT installed on it. As the clone is essentially a file copy, the process is i/o intensive and took about 10 minutes for the 8GB vm. With a configuration tweak and a quick start of the server, the cloned testing server was up and applying load against the website in 15 minutes total from start of clone to finish. Nice!

While testing, we found that the MS stress tool applies load, but has a nagging inability to capture enough information about a users' session so that an order can be completed through the test website. Also, the stress tool seems to quiesce after about 7-10 minutes. This may have been due to some caching on the database and web server layers, but is more likely due to a limitation with MS WSAT. So we are looking to replace this testing tool with one that doesn't have these limitations and can do interesting things like parameterize order and sku numbers in the requested URL. Compuware QALoad is a top candidate and one we're already licensed to have. We are currently researching tools for round 2 and hopefully, we'll have a substitute in the next couple of weeks.

In order to get a more full day of testing on the 4600, we will schedule a second visit to our vendor, with the caveat that I will bring a fully configured database, unzipped on an external USB drive in order to expedite the setup. Also, I hope to persuade the vendor to get an Enterprise license for ESX Server, so that we can assign more than four CPUs to an individual vm. Finally, at the end of the day, I will try to provide some screen shots or scripts of the evaluation session for the blog.

More to come..keep you posted!

No comments:

Feel free to drop me a line or ask me a question.