Friday, July 06, 2007

measuring performance while using VMware Server

My first performance posting regarding VMware Server’s love of large fast drives (/2007/06/performance-note-for-vms-they-love-fast.html) was sufficient, but not detailed enough for some readers. So I thought I’d give you guys a bit more technical information on measuring performance while using VMware Server.

Configuration
I recently created a rather large vm. It is a 45GB XP Professional guest OS built using VMware Converter 3.0.0 build 39557 and runs under the latest VMware Server version 1.0.3 build-44356. VMware Server is running on my desktop, XP Professional running on a Dell Precision 670 workstation with dual 3Ghz Xeons and 3.2GB RAM.

Before I started the test, I already had a couple of virtual machines already running:
- a Windows 2000 Advanced Server vm
- a Fedora Core 6 vm

Monitoring Performance
On my host OS, I started Performance Monitor with the following counters:
* Physical Disk Object -> Avg. Disk Write Queue Length
* Physical Disk Object -> Avg. Disk Read Queue Length
* Physical Disk Object -> Avg. Disk Queue Length
* Processor Object -> % Processor Time


Average Disk Queue Length is a sum of the number of read and write requests (Read/Write Queue Lengths) queued for the selected disk. Obviously, Average Disk Read Queue Length is the number of read requests queued for the selected disk and Average Disk Write Queue Length is the number of read requests queued. Such things as disk speed, disk cache size, i/o bus speed and RAID configuration (if any) affect disk throughput, which is the amount of read and write requests that the disk system can handle at any one time.

I also use the memory counters available in Task Manager:


The Test
I saw that my system slowed down considerably while using my big XP virtual machine, so I started Performance Monitor and selected the key indicators shown above plus “% Processor Time”, a measure of the amount of CPU used. After I started up my XP vm, it booted up to the logon prompt and I logged in. Immediately, I noticed my system slowed down to a crawl. If I tried to open any applications, those applications would take five minutes to start. So where’s the problem?

CPU Not a Problem
For this time period, I first looked at my CPU data in Performance Monitor. I note that since CPU is low, between 5-35%, CPU is not the cause of any system slowdown:


Is Memory the Problem?
Next, I took a look at the stats in Task Manager. I notice that my 1GB pagefile is being utilized:


However, Total Physical Memory does not exceed Total Commit Charge (shown in the above graphic). This would indicate insufficient system RAM. From these stats, it looks like my system is not memory bound.

Disk Queuing Through the Roof!
Looking at the Disk Queue stats, I noticed that disk queuing went through the roof:


In the Performance Monitor graph shown above, the black line is Avg. Disk Queue Length. You’ll notice that if you add the values of Avg. Disk Write Queue Length and Avg. Disk Queue Read Length together at any one point in time, that the sum of those values is equal to Avg. Disk Queue Length.

In a system performing optimally, there should be little disk queuing happening. According to the help menus in Performance Monitor, the recommended values of Avg. Disk Queue Length should be the number of spindles plus 2. In this case, I have a stripe set (RAID0) of two drives. There are two spindles corresponding to the spindle in each drive of the stripe set, so the optimum Avg Disk Queue Length of my system should be less that four.

In the image above, you can see values of between 25 to 40 (note the scale) for Disk Queue Length. There is some pretty heavy queuing going on here! I note that my machine is still barely responsive when I try opening any applications. You'll notice that there are breaks in the Performance Monitor graph. This indicates that my system is so bogged down that the Performance Monitor program is freezing. Youch! This screen was captured after about ten minutes since I started the VM. After about 20 minutes, the disk queuing stops and I am able to open applications normally:


You can see in the above graphic that the high disk queue condition stops after I’ve logged into my XP vm and the guest OS fully loads. For the 45GB XP guest OS running on my dual proc Dell Precision workstation, this took a full twenty minutes! Wow. That is a LOT of disk queuing. Again, CPU and memory stayed fairly constant during this experiment. CPU was roughly 30% and there was a little less than 2GB of RAM available during the process.

One note: if you don't like the default scale of 100:1 on the Performance Monitor chart, you can change that default scale by:
1) right-clicking on the statistic in the legend of the chart
2) select Properties
3) click the Data tab
4) choose another scale under the Default Scale dropdown menu

Conclusions
Perhaps the VMware engineers can explain why the VM took so long to free up my disk resources, but I suspect that it is simply the fact that the virtual machine is so big (45GB) and that VMware Server cannot handle a vm of that size efficiently. I’d really need a server class machine with super fast disk i/o to handle the intense read/write activity.

Perhaps you have an interesting VMware performance story? If so, drop me a line at cacasododom@gmail.com or just comment below.

Have a good weekend everybody!
'sodo

Addendum:
A reader asked if there were any good programs to correlate process id with i/o. For Windows, Mark Russinovich's Process Explorer is an excellent choice. Just add I/O Read/Write Bytes from the View -> Select Columns -> Process I/O tab:

1 comment:

Someone said...

I am able to confirm this, it is at least Host OS independent e.g MS Server 2003 guest under Ubuntu Server host (i.e. no gui and tickless, ubuntu desktop is not installed), Win2003 Host, Vista OEM Host, XP Pro Host running Win 2003 Server guest or XP Pro guest. The same issue occurrs e.g basic write test on both linux host or M$ with cygwin: "dd if=/dev/zero of=./raw bs=1024k count=100" host(HP DL380 G5 SCSI SAS, RAID 5) = 300MB/s, Win2003 Guest = 9MB/sec. Now I have seen posts saying raid 5 is slow (which doesn't make sense, oh I already checked the partition alignment thing i.e. performance but this didn't cure yes it is better but a red herring)?? so I tried virtualbox to see if I was going mad and as much as software dev guys like to get you crawling all over your hw config I am sceptical as I have seen this on ide,raid, sata etc... amazingly virtualbox compiled on ubuntu host hardy heron 8.04 actually massively outstripped my Host OS on disk write performance????? how is that possible... I don't mean on one test but 6+ tests!! This was to confirm that it was/wasn't vmware problem which it so far clearly is... ironically as i don't have a support contract i am unable to troubleshoot with vmware....

Feel free to drop me a line or ask me a question.