Isolating performance issues on biz lan

occsean

Active Member
Reaction score
127
Location
Oregon City, OR
Morning everyone...

I have a veterinary practice I am currently working with that is suffering intermittent and maddening performance issues when using their practice management software (Avimark). Spcifically, having a task as simple as opening a chart taking 30-45 seconds at a time. Or moving between modules or screens in the software. Basic stuff that should and used to be much faster. I know speed is a matter of perception but customer says this has been occurring for over 8 months.

They initially called me in a few months ago as their current tech had not been able to solve the issue over the course of 8 months and they wanted a fresh set of eyes. I identified one of the major culprits being an 8 year old AMD FX desktop hackjob of a machine acting as their "server" for the Avimark software and database and recommended it be replaced with a proper server as suggested by manufacturer.

They purchased a server direct from the software company. Xeon E3-1225v6, 32GB ECC, 4X1TB SAS HDD in RAID 10 on a MegaRAID SAS 9361-4i card. So the machine is most definitely a million times better than the previous junk. It's essentially being used as a file server, hosting AD, DNS, and DHCP.

Here's the problem....They are still experiencing the same issues as before. The server needed to be replaced regardless and while I didn't expect it to be a one size fits everything solution I did expect to see vast improvement if not near nirvana. That's on me for being tunnel visioned.

In any event, I'm trying to determine the most effective way to completely isolate and resolve the issues and not leave any stone unturned. My ideas so far are:

1) Their ancient SonicWall has already been pulled. Unfortunately they stuck an 8 year old Firewall VPN from Netgear in it's place with a static address that I'm not even sure is theirs and hard coded DNS from Comcast. DHCP is handled by the server. I want to swap that out in order to troubleshoot further

2) I'm going to bring in a certified low level contractor to test and certify every cable that is in the wall as well as double check jacks. There is no patch panel and if warranted I will advise getting one. Currently a hodgepodge of small unmanaged gigabit switches.

3) Most of the workstation machines are C2D or 1st and 2nd gen Core CPU's with 4GB of RAM. No GPO's are being pushed and all users utilize the same login and have local admin rights. I've advised replacing and locking machines down, so far fallen on deaf ears. I've proposed bringing in a box that just has OS and LOB app and testing performance on it as compared to existing machines

4) I've installed Spiceworks Network Monitor on the server and am gathering data to analyze as well


What other steps can I take to isolate and solve these performance issues? Accoprding to the customer they tend to be site wide when they happen (16 workstations) so where I suspected the server before I am really leaning towards the network infrastructure as the cause but I want to be able to make my case with evidence prior to advising any drastic steps be taken.

Appreciate any and all insight from you all

Thanks
 
..... hodgepodge of small unmanaged switches, and workstations using C2D machines?


Sounds like the whole setup is suspect. Why the 8 year old netgear firewall vpn? Why a
bunch of unmanaged switches.... some of those machines may have two, three or more
switches between them and the "server".

Those old "C2D" machines probably have 10+ year old "spinners" in them that are near death.


Sounds like this whole setup is a "nuke and pave".
 
I'd start with brand new workstations, proper server setup (meaning people not using local accounts with full admin rights), rewire everything and get rid of all the (most likely walmart) switches. I'd wager performance would take a huge leap forward.

If they are unwilling, you might as well walk away.

As a last resort, you can sell hard the idea of replacing just one machine with a brand new one and set it up properly. Ensure it doesn't have multiple switches between it and the server, and try removing / replacing any other 8 to 10 year old networking hardware.
 
IMO, it sounds suspiciously like there is a network loop somewhere.. that is to say, two layer-2 switches hooked up together causing a network storm.

Some Spanning Tree and/or storm control on the switches could help, if available. If not.. you can identify the loop issue in Wireshark:

 
As a last resort, you can sell hard the idea of replacing just one machine with a brand new one and set it up properly. Ensure it doesn't have multiple switches between it and the server, and try removing / replacing any other 8 to 10 year old networking hardware.

Yes, I've already told them I am bringing in a brand new box with just Windows and access to the LOB software and placing on the same switch as the server. I'm hoping that will help open their eyes to the ENTIRE issue.
 
IMO, it sounds suspiciously like there is a network loop somewhere.. that is to say, two layer-2 switches hooked up together causing a network storm.

Some Spanning Tree and/or storm control on the switches could help, if available. If not.. you can identify the loop issue in Wireshark:


Yep. Or even a patch cord with both ends plugged into the same switch. Seen that more times than I care to mention.
 
I'm agreeing with Brandon....guessing there's a bunch of daisy chained 5 or 8 port soho grade switches. Time to work them towards all workstations have a run to a central switch..and get a good switch in there. Some workstations probably have multiple hops on their way to the server.

The old Nutgear router...while I hated the old prosafes and most netgear routers, I doubt its in the mix here, unless some of its LAN ports are being used as a switch for some nodes.

4 gigs of RAM on old C2Ds...and that vintage...likely old rotating hdds that are well worn and performance is suffering....ugh...

I read those server specs, I see "1TB" SAS..makes me think 7,200rpm nearline SAS (really just SATA disks with a SAS bridge). Not true server hard drives like 10krpm or 15krpm or SSD.
A single RAID 10 volume..not fond of those either, I prefer separate RAID volumes...1 for the C, 1 for the D. But the nearline SAS...eh.
 
Yes, I've already told them I am bringing in a brand new box with just Windows and access to the LOB software and placing on the same switch as the server. I'm hoping that will help open their eyes to the ENTIRE issue.

That'd be a good "Demo"..bring in a rig with an i5, 8 gigs, and an SSD. Plug right into the same gigabit (hopefully) switch that the server is in.
 
Yes, I've already told them I am bringing in a brand new box with just Windows and access to the LOB software and placing on the same switch as the server. I'm hoping that will help open their eyes to the ENTIRE issue.
When doing this it would be helpful if you can disconnect everything else from the switch except what is needed to do the test. That will help isolate the issue. Then plug in more equipment and test again. It will take some time, but will help narrow down the issue.
 
I read those server specs, I see "1TB" SAS..makes me think 7,200rpm nearline SAS (really just SATA disks with a SAS bridge). Not true server hard drives like 10krpm or 15krpm or SSD.
A single RAID 10 volume..not fond of those either, I prefer separate RAID volumes...1 for the C, 1 for the D. But the nearline SAS...eh.

Yup...Exactly what they are...Thanks for the clarification. Verified that they are 7200. Server shipped pre configured with single volume RAID 5. I pushed and persisted with blowing the RAID volume and replacing with RAID 10 instead. I hadn't really considered 2 volumes due to only having 1.81TB of disk capacity. Would 2 volumes gain any performance or data protection over 1 volume?
 
While a pile of rinky dink 5 port switches all hooked up every which way can cause a lot of issues... If I were you I would want to be 100% sure I knew what the issue was.
  • Can you install the LOB app on the server and open/run it directly? TRY THIS FIRST.
  • Bring a freshly imaged machine, install LOB app, plug into switch right next to the server. Test performance.
  • Run Wireshark while you do this and look for lots of RED.
  • Run procmon/explorer on the server to see if there's anything suspicious
Very important here- when they replaced the server did they do a fresh OS installation or was this a virtual machine that was moved over (and all of the problems along with it?) This could be purely software related.
 
Back
Top