Can't contact Hyper-v Host, Guest is fine

HCHTech

Well-Known Member
Reaction score
4,045
Location
Pittsburgh, PA - USA
I've got a Hyper-V Host we put in about a year ago at a clients. Currently only a single guest VM, which is a Domain Controller. Server 2019 Standard. The host was installed with the full desktop, not core.

A couple of days ago the Host stopped reporting in to our monitoring. The guest VM is working fine and is reporting in fine. All workstations in the office are functioning and the domain is "up" with no complaints. Also, the monitoring, up until it stopped getting data has no sign of problems. No critical events in the event logs, plenty of disk space, no performance warnings, no failed patches, no failed logins, etc.

I remoted into the DC VM and tried to RDP into the host - no dice. I tried to ping the host, no dice. I VPN-ed into the firewall and tried to ping the host - unsuccessful. I am able to ping workstations and the DC from the firewall. I called my contact at the clients, and had them go to the server and tell me if they say any error messages or other notification out of place - nope. I had them pull up services, all automatic services were running except for Remote Registry and Software Protection (I think - it was one of the normal ones that only run when needed). Since it's the DC on this box so far, I had them reboot the physical server. Everything came back up correctly on the DC, but I still cannot contact the HOST, still can't ping or RDP into it from inside the network, and no surprise, it still isn't reporting in to the monitoring dashboard.

I have had times when I couldn't get into a VM guest server for one reason or another and was able to control it by adding it to server manager on one of the working server VMs. But I'm pretty sure you cannot add a HOST to the server manager of a guest VM, though. I tried that with my shop setup, and the host isn't even detected - no surprise.

I've never seen this behavior before, so I'm trying to gauge how much of an emergency this is. Do I make the 30-mile drive today or can it wait until the weekend?
 
Out of band management exists... use it.

The host networking configuration has been FUBAR'd, this is utterly independent of the guests.

If the guest is a DC, the host MUST NOT provide time services, and if the host is a member of the domain... this presents a specific challenge. DCs need to be specially configured to their own clock. This creates a nasty loop if the host is a member of the domain... break it!

Also RDP is available by default in the Domain and Private Windows Firewall profiles. But, if the host reboots for updates, the guest DC isn't online yet so the host flops into the public firewall profile. I solve this with a script that runs about 10min after reboot that restarts the network location awareness service. Alternately you can configure the Windows Firewall on the host to have matching configurations for both domain and public profiles.

But to sort this out remotely? You need to use iDRAC, iLo, or whatever is out of band. Also, RMM tools on the host usually work, but that assumes the host's IP configuration is good enough to get online.

This specific circumstance is why I always grab a used workstation of some crap level, to be there on the network for admin tasks. Its sole job is to get me into that out of band. If you don't have that available, you get to forward ports and pray it works over the Internet which... I've had issues doing.

I wouldn't consider this a system down... it's just annoying. The host is doing its job, or the guest would be toast. It looks to me like one of the common setup goofs that blows up down the road. I detailed the two bigger ones above.
 
If the guest is a DC, the host MUST NOT provide time services, and if the host is a member of the domain... this presents a specific challenge. DCs need to be specially configured to their own clock. This creates a nasty loop if the host is a member of the domain... break it!

We have an SOP on this, so I know that the time stuff is correct.
Also RDP is available by default in the Domain and Private Windows Firewall profiles. But, if the host reboots for updates, the guest DC isn't online yet so the host flops into the public firewall profile. I solve this with a script that runs about 10min after reboot that restarts the network location awareness service. Alternately you can configure the Windows Firewall on the host to have matching configurations for both domain and public profiles.

Hmm, this I'll have to check - good idea on the NLA script, I'm stealing that.

Out of band management exists... use it.

Yup this is the one that should have saved me. This is an intel box, so it's called 'BMC' (Baseboard management controller) and......it's not configured. This one's definitely on me. We bought the chip & license, but I didn't get it done before we delivered the thing, and of course, just ran out of time on the day we did the install. It's been on my list for one of the maintenance visits for months, but I just never got around to it. This is a client that is open 12 hours per day, 6 days a week, so it's damned difficult to find time to do anything that requires taking things offline. I hate working Sundays, so this is the price I'm paying. Live & learn man.
 
But I'm pretty sure you cannot add a HOST to the server manager of a guest VM, though. I tried that with my shop setup, and the host isn't even detected - no surprise.
You can; I do this all the time. I have a few multi-server installations that have what I refer to as general 'Admin VMs', which I use solely to access and monitor all the physical and virtual servers on the network. The host machines must be domain-joined of course, which they should be anyway to enable use of features such as DFS-R and Hyper-V replica.

Agreed on the Out of band management - It's the first thing I configure and I wouldn't install a server without it. I exclusively use HP rack-mounts, so I'd be logging into iLO right now to fix this.

In addition the the suggestions so far, are you able to scan the network/subnet for devices? I'm wondering if the server's IP has changed. And/or do you perhaps have any MAC/IP reservations that might be conflicting?
 
The host machines must be domain-joined of course, which they should be anyway to enable use of features such as DFS-R and Hyper-V replica.
Well, that explains why it didn't work on my lab setup. I have never joined a host machine to a domain, whose DC is one of the guests of said host. That just seems wrong to me, creating a loop - like badly-written time-travel movies I've seen. I could understand this in a setup with more than one physical host, and each host having a DC guest, I suppose, but with only one box, it just seems like a recipe for trouble.

I thought about the possibility that the host flipped to the public network profile, but that shouldn't have stopped the RMM agent from reporting in, so I don't think that's it.

The host has a static IP, which is within an exclusion range in DHCP. There shouldn't be any conflict. I did an IP scan from the DC and the host machine is detected with the correct IP and MAC address, but no communication works. Doesn't respond to attempts to browse, ping or RDP. I'm going there this afternoon, so I'll be able to get eyes on the bad actor and no-doubt figure out the problem. I'm sure it's something dumb, but I can't gather enough evidence to troubleshoot it without being there. The other possibility I suppose is another truism: "It's always DNS". We'll see.
 
Well, that explains why it didn't work on my lab setup. I have never joined a host machine to a domain, whose DC is one of the guests of said host. That just seems wrong to me, creating a loop - like badly-written time-travel movies I've seen. I could understand this in a setup with more than one physical host, and each host having a DC guest, I suppose, but with only one box, it just seems like a recipe for trouble.
Yeah, you'd think it wouldn't work but it does, perfectly in fact. Most of my customers' setups are like that (numerous domain-joined hosts and virtual DCs). It's how it's supposed to be -- how else would you properly manage numerous host server configurations and perform live VM migrations, Hyper-V Replica, DFS-R, etc (without multiple DCs or resorting to the backward step of deploying a seperate physical DC)? It's a well-documented configuration, and I believe it's discussed in the 'Mastering Windows Server' series of books too, in part due to fact that it seems like a somewhat contradictory configuration at first glance.
 
I no longer join hosts to domains in single server environments. Too many problems... largely due to Windows Firewall.

However, as soon as a client has two hosts, they get joined to the domain and we configure the hosts such that they don't reboot at the same time. The domain is always online, thanks to each host having a DC and ALL ISSUES vanish. I don't even need my delay script for network location awareness.

Moltuae is right though, so much of HyperV doesn't work if you don't have a domain. And you cannot use the tools mentioned without it either. Microsoft really wants those machines to be on the domain too. So strange things happen on either path, I've just learned to manage things in specific ways to avoid most of it.
 
Well, I just got back from the clients, the problem was that the DNS field in the NIC properties of the host was blank (so my guess that "It's always DNS" was spot on!). Obviously it was not setup this way originally, and I just can't imagine the single person there who knows where the password for the server lives would be capable of finding this setting, have any inclination to go poking around the server, or just plain have any time for such pursuits.

I looked back at the event logs to when it stopped working, but didn't find anything damning. I have no idea how this would have happened, but thank goodness it was an easy fix.

Oh, and RDP was disabled - I'm sure we never enabled it since we use our monitoring software for remote access. That one is my fault.
 
So you had your out of band offline AND RDP offline.

I generally want all three, out of band, rdp, and the RMM tool. One of them always works, but its bonkers how often two of them just won't work for whatever reason. But RDP loves to disable itself when Windows Firewall goes nuts anyway. So more often than not my fall back is the out of band, because the RMM and RDP are both toast.
 
Back
Top