ESXi - Troubleshooting VM performance

HCHTech · Aug 28, 2019

Above. I am troubleshooting some performance issues on an ESXi host. It's getting a little long in the tooth, but not due for replacement until the end of 2020.

This is a Dell R720, with Dual Xeon E52670s - so 20 cores @ 2.5GHz, 40 allocable cores with hyperthreading. 128GB of RAM. 8TB of storage - one big RAID array. There are a lot of VMs so it's possible it's just overburdened, but I'll wait for some opinions before jumping to that conclusion. The symptom I'm trying to address is complaints about performance on the workstation VMs named R1, R1_Win10 & R2_Win10.

There are about 35 physical workstations and 3 VM workstations. The VMs are accessed from a remote office over a site-to-site VPN, so that is a variable as well - the whole problem could be the speed of that VPN, but I've accessed the workstation VMs from the main site and they are unquestionably slow, so the VPN isn't my biggest suspect.

Here is a spreadsheet showing how the resources have been allocated and a snapshot of current usage. This is just a screen snip, so I hope it's readable when zoomed.

Processors have been over allocated, so that is one thought. Also, I wonder about the allocation of video memory. This is the only ESXi setup I have (inherited when I got the client), so I don't want to muck about too much without a little guidance.

Note that R2, R3, R4, and MetroXP are all VMs that are normally powered off. The XP VM runs an legacy app, usually accessed only once or twice each year.

Client is an Actuary. Only business apps are used, primarily Office, of course. 1 LOB apps uses a SQL database running on the APP-SERVER. The other LOB app uses an access database on the APP-SERVER. 7 Quickbooks users (max 5 simultaneous) opening a shared file on the APP-SERVER, too. 35 users running a shared Timeslips paradox database on the APP-SERVER as well. Email is O365.

No complaints from folks running physical workstations, only from those on the VM workstations. No crashing, just slowness.

There is a ESXi second host holding a backup DC and a workstation VM running Quest Rapid Recovery, doing backups once per hour. That process takes about 8 minutes and doesn't coincide with the complaints, timewise.

What do you think?

RCD_Technology_Solutions · Aug 29, 2019

Good Morning.

In my opinion you need shared storage. I have a pair of dell r710s and after 4 vms the storage cant deliver the iops.

go to http://www.yellow-bricks.com/esxtop/

look at CPY %RDY and the DISK parameters.

A simple Synology or Qnap iSCSI box could help. Except for the $$$

Thanks
Bob

mikeroq · Aug 29, 2019

HCHTech said:
8TB of storage - one big RAID array.

Could you be more specific?
How many drives? What sizes? Array type?
What controller?

RCHavok797 · Aug 29, 2019

Open up Resource Manager on the virtual machine, go to the Disk tab, and expand the Storage section. See what the value of the disk queue length reads. This will give an indication if the disk subsystem on the physical server can't keep up with the requests.

YeOldeStonecat · Aug 30, 2019

"One big RAID"...on what for drives? And what RAID controller?

I don't count hyperthread core for hypervisor CPUs...I only count physical cores. Careful about oversubscribing...although I don't think oversubscribing cores is your issue if some of the guests are powered off.

What about the NICs? Do all these share a NIC or two? Or do you have an additional quad core NIC in there. Spread the load.

But back to the disks....what disks...and how is the LUN connected? By what media and speed? Internal? Fiber SAN? iSCSI?

HCHTech · Aug 30, 2019

Valid points, all. - more details on the hardware:

Single RAID card is a PERC H710 Mini
5 x 3TB 6Gbps SAS drives in a RAID5 with 1 hot spare, so 8.3TB usable capacity - don't flame me about RAID 5 - this server was built in 12/14 -

Broadcom 4-port NIC - I can't figure out how to tell which ports are assigned to which machines from the vSphere web console, but esxtop reports:

vmnic0 - Maintenance WS, APP-SERVER, MetroXP - all on vswitch0
vmnic1 - DC, R1_Win10 - all on vswitch0
vmnic2 - R2_Win10, Powerchute, R1 - all on vswitch0
vmnic3 - DC, Maintenance WS, APP-SERVER, MetroXP, R1 - all on vswitch1

I tried to get some results for the workstations in remote background using winsat, but no joy, so I will need to wait until after hours to measure the disk queues on the workstation VMs, but after watching the servers for a few minutes, I see:

DC - flutuates between 0 to 0.05, no spikes
APP-SERVER - fluctuates between 0 and .6, spikes to 1.23

I also imagine I need to set some performance counters to get a log of the disk queue for the problem machines to get useful data - I'll look into that this weekend.

Edit: Also - I have 3 empty drive bays, I might setup a new RAID 1 with a hot spare with SSDs, then move the workstation VMs there - just a thought.

HCHTech · Aug 30, 2019

YeOldeStonecat said:
I don't count hyperthread core for hypervisor CPUs...I only count physical cores.

Yes, but both HyperV and ESXi do (count HT). When you assign cores, you are assigning based the total including hyperthreading, yes?

YeOldeStonecat · Aug 30, 2019

3TB drives? Makes me thing "nearline SAS"..which is a 7,200 rpm SATA disk on a SAS bridge.
Performance loss.
RAID 5...performance loss..esp for write intensive apps (SQL anyone?) Virtual desktops used by end users "thrash" the hard drives a lot. So with a single R5 volume on 7,200 spinners....with a SQL server in the mix....I know where I'm focusing the spotlight for low performance.

YeOldeStonecat · Aug 30, 2019

HCHTech said:
Yes, but both HyperV and ESXi do (count HT). When you assign cores, you are assigning based the total including hyperthreading, yes?

They count them..but I don't. soft cores aren't as good as true hardware cores. Not equal in performance. I distribute cores counting just physical and I do my best never to let the ratio go to oversubscribing.

HCHTech · Aug 30, 2019

YeOldeStonecat said:
They count them..but I don't. soft cores aren't as good as true hardware cores. Not equal in performance. I distribute cores counting just physical and I do my best never to let the ratio go to oversubscribing.

This may just be semantics, then - so what I think you are saying is that you wouldn't allocate an odd number of cores to anything. If I have 20 physical cores, HyperV reports 40 (because it counts HT). If I then I assign 10 cores to a VM in HyperV, I'm actually assigning 5 physical cores. By assigning 11 cores in HyperV, I would actually be assigning 5.5 physical cores, which is a no-no according to your rule. Do I have it right?

YeOldeStonecat · Aug 30, 2019

If I have a Xeon like the E7-8890v4...she has 24 cores, but counting H/T she has 48 cores.
I count and use 24 cores, I ignore the 24 extra "soft cores".

So, with that CPU, I allocate/distribute 24x cores...and try not to go above that. I actually like to leave at least 2x cores free/avail for the host.

Sky-Knight · Aug 30, 2019

See, and I do the utter opposite...

Every VM has access to all the cores, if I have 48 cores, each VM has 48 cores assigned to it.

Why? Because its the job of the hypervisor to schedule tasks queued by the operating systems it manages. You use weights to control CPU access priority. This affords 100% utilization of the platform.

Think traffic shaping, instead of bandwidth limits. Who cares if that lower priority system is sucking up all 48 cores as long as they aren't needed at that moment for something more important?

Honestly, I think the performance problems you're experiencing are purely the fact they've outgrown their drive array. But, be mindful of what I just said about CPU, because if you insist on limiting each VM to 1 or 2 cores, you can create the situation where the VMs are thread bound, and they simply cannot generate the load required to make the CPU actually work. So you'll see CPU as a non issue, and it isn't... because your scheduling is crap. Your reported load is screaming this to me, the VMs aren't using your CPUs!

BTW this is even more true on HyperV than it is on vSphere. Allocate more cores to your stuff, you'll get more performance.

vCPU configuration simply tells the hypervisor that this VM is allowed to have X number of threads scheduled at a time. It does not reserve a core for the VM, and while I know that vSphere does allow you to reserve cores, you should consider not doing that as much as possible. That's basically destroying a substantial portion of the value of using a hypervisor to begin with.

HCHTech · Aug 30, 2019

I know what's missing from this discussion......whiskey. @YeOldeStonecat , I absolutely believe your system "works" because you have probably deployed about 500X more servers than I have and you've developed this mindset because, well, it works. But ignoring your experience for a moment and just doing this as a thought experiment, it seems like you would have to end up with underutilized resources. Only running 4 cylinders of your V8. Maybe that's the goal. It almost seems like the next step that would follow from this mindset it to just turn off Hyperthreading altogether for your builds?

@Sky-Knight , your ideas intrigue me and I would like to subscribe to your newsletter.

Assuming that the Hypervisor's programming allows it to do the resource-allocation job correctly (which might be a big assumption, IDK), it seems that your method would have to get the best all-around performance for all VMs. I know know about what overhead is generated by forcing the system to do more allocation juggling, for example. Is there an arbitrage to be had between machines, then? How does the Hypervisor know that an applications call for more cores on a workstation vm is less important than an applications call for more cores on the application server, for example. I thought this was handled "backwards" by reserving cores by machine, but your approach goes the other way. Again, I have to assume that you developed your system guided by your own experience of placing servers in service with this approach, and that you wouldn't do it that way if it didn't work.

I don't think there is any downside to trying your method and seeing what happens. If that doesn't help, then I can focus my efforts on the disk subsystem, maybe make that new array just for the workstation VMs. I don't know if I can put a 2nd RAID card in this rig or not - that might be another idea to give more horsepower to those workstations.

freedomit · Aug 30, 2019

I would definitely say the bottleneck is disk IO. In a HyperV environment I would login to the host, open resource manager and check disk usage time and queue length. Does ESXI have a similar metric?

Sky-Knight · Aug 31, 2019

@HCHTech, you're actually over complicating things!

So another technical name for an Operating System is a Supervisor. That is, someone that's responsible for managing something. In this case, what's being managed are threads. Now, I'm not talking about a process, I'm talking about a thread! A process might use a single thread, it might use 80, it depends on what the process was programmed to do.

But the key point here is, the operating system is managing all these CPU threads, controlling who gets the CPU when so we can do this wonderful multitasking thing.

A hypervisor, is a supervisor of supervisors... The boss of the boss. And now you know just how unimaginative Microsoft was when they named their product HyperV!

So, this process is very corporate. You're trying to make sure the rank and file are working on the same page as the board, because when they are the entire machine meshes more efficiently. If it's not, you wind up with an overworked middle manager somewhere sucking up all the productivity, and in this case that middle manager is the guest OS kernel telling everyone to wait because he's only got these 2 lines to shove everything through.

Another way to look at it is to consider the CPU a freeway, if you've got 48 lanes to put traffic on, but all the on ramps are limited to 2 lanes, can the traffic ever get strong enough to fill the freeway? No... it cannot!

As for my process, and how I do things. I USED to configure things the way Stonecat does. Then I found a news letter from Microsoft somewhere along the way that beat me over the head and I changed gears. I've noted a massive performance improvement across the board because even the lowest priority system can saturate that CPU if it's available. This means my SQL loads get priority, as does my Domain Controllers, but everyone else gets to play in the margins. But the key is, each individual guest OS can scheduled their threads at near native speeds, and the hypervisor sorts them out. I don't have that middle manager (Supervisor, or Guest OS) telling its own workers NO!

You'll need to understand my thinking here, because the way I'm talking is how Azure works! If you want to maximize Azure scaling benefits, you need to understand this! But the same ideas apply to any hypervisor, but it becomes more and more important the larger the environment gets. For small environments your worst case is some longer install times on Windows updates, but in big ones you can have hundreds of CPU cores sitting idle while user facing work loads wait... that's bonkers! That's how you piss off your users for no good reason! You've basically slapped a Comcast style data cap on your CPU time... not good!

But again, honestly I think most of what you're feeling is the drive IO. The techniques I just mentioned to you simply won't matter if the drives cannot keep up, and if you're on platters at all... I can tell you... they cannot keep up. Not unless you've got a HUGE RAID 6 with SSD caching in a SAN somewhere. But you reported RAID 5 on NLSAS and only 5 drives to boot. So that's four spindles, four DESKTOPS worth of speed! Two servers on that and it's full, you have more... If this was a six drive RAID 10, with three spindles teaming up, things would be a bit better... but honestly you just can't not have servers without SSD. I know that platform is young, but I'd be looking to migrate off of it, if not to a new platform then to Azure. You'll be much happier when you do.

But for now, it seems to me that you have a stable platform that's doing what it can. And your users are complaining because they've probably got an SSD equipped desktop / laptop and yeah... in comparison the server is "slow".

HCHTech · Aug 31, 2019

@Sky-Knight , does this methodology apply to RAM allocation as well? I'm already filing this thread under "everything you know is wrong", so what is your approach to that?

Sky-Knight · Aug 31, 2019

@HCHTech, yes... and no...

RAM you usually want to dedicate, because you can't "share" it unless it's empty. However, that being said HyperV has a feature called dynamic memory, which allows the hypervisor to dynamically turn RAM up and down based on use of the VM, which can open some doors here. I'm pretty sure vSphere has an equivalent but I've never used it.

But, in my experience, this results in substantial performance loss and in some cases instability. RAM is also rather cheap, so I tend to dedicate RAM to guests, especially any work loads that have performance requirements like SQL.

CPU is just scheduling, sharing RAM is more like sharing drive space. Yes, you can have your guests have more storage on their partitions than you're actually using thanks to thin provisioning, but that comes at a performance cost, and the risk of boom when the storage runs out. RAM is very similar in that regard.

Note: I have enabled and used dynamic memory as a profiler of sorts, and successfully identified VMs with too much RAM provisioned. It's handy for that! Domain Controllers for example use far less RAM than I tend to want to think they will.

YeOldeStonecat · Sep 3, 2019

HCHTech said:
... and just doing this as a thought experiment, it seems like you would have to end up with underutilized resources. Only running 4 cylinders of your V8. Maybe that's the goal. .

That is usually our goal. We have always leaned towards "more server than necessary"...versus "just enough" or "not enough". I like to raise the bar, not see how low I can place the bar. I never want to get that call from a client that "the server (my software) is too slow!"

When it comes to "cloud hosting"...you see lower price brackets which are "shared resources"...and frequent complaints of them being too slow. And you see the higher priced brackets where you have "dedicated resources"...because you need higher performance. Those are widespread for a reason!

I've never tried the opposite extreme for cores that Rob (Skye) mentions above...but I have taken over clients where the prior tech tended to oversubscribe a handful of cores and I dialed them back. For larger virtualization projects I prefer VMWare anyways, and I tend to keep MS HV for smaller setups (I'm not fond of having high SLA clients/24 hour clients on a host that should have occasional updates/reboots like MS-HV). But anyways..I may try that approach on the next build that comes in...as a test. But I'll probably still fall back to carefully controlling things. I can see a guest instance get some runaway process that kills performance for everyone else for a while.

TurricanII · Sep 3, 2019

Sky-Knight said:
See, and I do the utter opposite...
Every VM has access to all the cores, if I have 48 cores, each VM has 48 cores assigned to it.

This sounds insane to me. I always leave one or two cores for the Hypervisor and work to the vendor best practice of e.g. 2:1 vCPU:CPU ratio.

Over allocating CPU's does in my experience kill performance. This article mentions something I have seen mentioned previously; the scheduler having to wait for all allocated CPU's to be available at almost the same time: https://ma.ttias.be/virtual-environments-less-cpus/

If there is any shred of truth to this or any reason for the vendors recommendations of 2:1 ratios then allocating all CPU's to all VM's is surely a crazy idea!!!

Sky-Knight · Sep 4, 2019

If over allocation of CPU resources is slowing you down, that's because you didn't configure the CPU priorities properly.

You can limit that freeway access, and yes it will work. But you'll be putting your hands around the throat of that CPU and choking the life out of it. Yes, it works... but it could work better.

If you have four VMs on a host, with 48 cores. And you limit each VM to 2 cores, you're only ever going to have 8 threads running, on a system that can handle 48. But you can't limit yourself to 48 vCPUs at one per VM either because unless you've got 48 VMs you're never going to have 48 threads to use the host's CPUs.

A vCPU isn't a CPU, it's the permission to execute one thread.

The key, is to have just enough vCPUs such that you can have all 48 cores running, while not having the scheduler say no too often, and for the times when it does, you have your priorities set. This is far more art than science because exactly how you balance it changes based on the work loads in question. By all means, reserve CPU resources for that huge SQL VM! You'd be insane not to. But other systems such as Domain Controllers are far more compliant about sharing, VDI situations share really well as well.

This is just like QoS on an Internet connection. CPU time, like bandwidth are functionally limitless resources. The limits are set instead on how much work you can do in a given unit of time. You see my vCPU configuration as over allocation, but that over allocation just means my CPUs actually get used. If your CPU graphs are all flat lined at the bottom, you aren't using your hardware.

So you can allow every VM access to every core and define rules on what VM gets things first. Or you can limit less important VMs to fewer cores so that more important ones have greater access on top of the priority settings. You end up doing both honestly... and I didn't mean to imply you shouldn't. All I wanted to point out was that you're not letting your CPUs work if you aren't scheduling time on them, and if you aren't letting the VMs pile up a bit you're rather missing the entire point of virtualization.

ESXi - Troubleshooting VM performance

Well-Known Member

Active Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Similar threads