Can't log off, reboot, or shutdown. Server 2003

nlinecomputers

Well-Known Member
Reaction score
8,594
Location
Midland TX
I have a client that has a Windows 2003 Server SE Service Pack 2 running on a HP ProLiant ML350 G5. System has two volumes on a single 4 drive Raid 5 SAS stack.

Sometime on Sunday the 12th the system locked up. Monday morning they rebooted the server to have drive 4 drop. RAID rebuilt but flagged the 4th drive as Failure imminent. I ordered replacement and hot swapped the bad drive for a new one. Raid rebuilt and everything seemed fine. Monday of this week I get a call that they can't access any of there scanned Docs in Fortis Document Imaging. I know nothing about how fortis works. But the system is running very sluggish and freezing up. I try to log off the admin and can't. I try to reboot and can't. I try to shutdown and can't. The system simply does nothing expect post an event 1073 error in the logs.

I was forced to hard shutdown. OUCH. (i logged every one out and stopped SQL servers before doing it.) Rebooting an error displays from the array mentions that data buffered in the RAID array was successfully written to disk inspite of the reboot(???)

Fortis is still not working correctly after several calls to idiots at their help desk I finally get the one good tech they have and he tells me the archive is missing? WTF? I check the last backup and the files are there and I was able to restore. But the server is still running slow. I can't find any errors and I still cannot reboot.

Also all of my scheduled tasks will not run.

And my monitoring software GFI is having issues. It no longer will scan the system to check for updates. Says the database is not upto date.

I want to check the system with diagnostics but the inability to shut down has me greatly concerned.

How can I check on what is aborting the shutdown? :confused:
 
I'd uninstall AV, and see what you can do to kill services in task mangler...and then bounce the server and get into safe mode.

I'd confirm a recent backup, and then run chkdsk on both partitions.
Speaking of partitions..."ouch"....both of them on a single RAID 5 volume.....ugh. However..just a cause of poor performance, not a cause of your issue.

What kind of backup was there? Image? Or just data backup? If image....might consider a full image restore from just prior to the drive tankin'. Copy what current data you can first to restore on top of that.
 
It uses Backup Exec 12.5. UGH.

The performance problems have been sudden. They go through lots of documents that they scan into a SQL driven database (Fortis Document Managment) Can't really do a rollback here.

Trying to arrange after hours physical access to the unit so that I can do some diagnostics.

Need to run SFC on it and remove the AV and others.
 
Does shutdown actually abort? Meaning did you get a message saying it can't shutdown? Or when you tried shutdown it does nothing. Either way I have seen that a few times. But usually a power down, wait a few minutes and power up again and all is well.

Did you test the drive you removed with another tool to see what it says?
 
Does shutdown actually abort? Meaning did you get a message saying it can't shutdown? Or when you tried shutdown it does nothing. Either way I have seen that a few times. But usually a power down, wait a few minutes and power up again and all is well.

Did you test the drive you removed with another tool to see what it says?

Except for logging the abort in event logs it does nothing. I've not tried a command prompt abort. The end users are not letting me work on it right now as it is working and shutting it down shuts the business down. I may not really get a crack at it until Saturday.

I don't have anything with a SAS interface to really check the drive with. But I trust the result. It's not the first time the drive has dropped. It dropped before about 3 months ago but it rebuilt on reboot and passed diagnostics then.
 
Does shutdown actually abort? Meaning did you get a message saying it can't shutdown?

I was just about to say the same thing. Sounds more like something is preventing/delaying shutdown rather than aborting it, especially taking into account the sluggish performance. I'd second Stonecat's suggestion about running a chkdsk on both partitions. So far, I'd say everything points towards NTFS inconsistencies.
 
The only reaction from the server is this in the system log:

Event Type: Warning
Event Source: USER32
Event Category: None
Event ID: 1073
Date: 10/23/2014
Time: 9:49:37 AM
User: MCCANN\Administrator
Computer: MS-SRV1
Description:
The attempt by user MCCANN\Administrator to restart/shutdown computer MS-SRV1 failed

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 80 48 3e 77 €H>w

Note the results are the same for LOG OFF, REBOOT, or SHUTDOWN request.

I'm stuck logged in as Administrator.
 
Except for logging the abort in event logs it does nothing. I've not tried a command prompt abort. The end users are not letting me work on it right now as it is working and shutting it down shuts the business down. I may not really get a crack at it until Saturday.

I don't have anything with a SAS interface to really check the drive with. But I trust the result. It's not the first time the drive has dropped. It dropped before about 3 months ago but it rebuilt on reboot and passed diagnostics then.

Well, a business does need to run. To be honest it really sounds very similar to a server I worked on sometime ago. They started having problems, including reported bad drives, and it eventually stopped working. Turns out it was a RAID controller issue. But the file system damage was permanent by then. Obviously you need to do all of the backup, research, etc due diligence stuff. But I would look at maybe resetting the RAID card and then have it read the config from the drives. That's what we did to get that other server to a somewhat workable state.
 
Never had to replace a RAID card. Or reset one either. The configuration is stored on the drives I assume? SO I can just Nuke the card, reboot and it will re-setup itself automagically? (After I sacrifice a goat to Ba'al...)
 
Never had to replace a RAID card. Or reset one either. The configuration is stored on the drives I assume? SO I can just Nuke the card, reboot and it will re-setup itself automagically? (After I sacrifice a goat to Ba'al...)

Need to have several goats as well as lambs to be on the safe side. LOL!!! Yes, real RAID cards store the config as well as writes it to each drive. On the call I was on we were getting some kind or brief error during boot. After some research on my part and discussion with all parties involved that is what we did. The machine booted to a login and they were able to get back in. But as I mentioned the damage had been done. The fax program and database was corrupted so they set it up on a VM for the time being.

That was a Dell with a PERC. My guess is that, as long as the card is a real RAID card, you could do the same. But it is not automagically done. You will get a mismatch error from the controller when booting. And you then tell it to read the config from the drives.
 
It's HP e200i Raid card

Given the price points I am seeing for that card I seriously doubt it's a real RAID card. Most likely a software raid where the OS and CPU does most of the heavy lifting. If you boot into a Linux distro that should tell you. Real RAID cards will see the RAID volume as created by the card. Software RAID cards will show the individual drives in Linux. I know that is the case with low priced 3rd party RAID cards I have messed around with.
 
HP RAID cards store the config file on the HDDs.....so you can take a server...with existing RAID, say the RAID card blows out...you can swap the card, it will pull the RIS files from the HDDs and put humpty dumpty back together. All automated...BAM done!

I believe even the entry level HP RAID cards do that, I know the better "SmartArray" ones do. The e200 was entry level.
 
Is this an AD machine? Try logging in with another account to see if you can shutdown/reboot.

As StoneCat said, I think it would be worth opening up task manager, close a program/tast, try to reboot. Keep doing this until it either reboots or you've closed everything and it won't. Maybe even extend this to services? Maybe its just a process/service that is borked and causing the issue.

Although that dropped drive does raise suspicions, and I think that the guys are onto something here with the RAID card. Can you use any tools to create an image and scan the image for corruption, or is it too far gone at this point?
 
I was able to successfully reboot the server by using the GFi reboot command from the Dashboard. I just realized that each shutdown logoff event has been with TeamViewer and not when I've been present in front of the unit. TV is part of the GFi Hounddog monitor. Time to just wipe the system of GFI by hand and leave it off for a few days. Use something else to remote manage it and see if that corrects the problems. Can't remove it by remote so hopefully I can do this Saturday.
 
It is the only server. It is the AD,and a SQL server running for a 3 PC stock broker firm.

Yes I know it is not best to combine roles but it's work load is small. Until now it has run well and without any issues.
 
Back
Top