I have a mid sized network, around 80 users. Most of these users are using windows clients, with a few macs. There are 3 Linux servers, 2 for file services (samba and netatalk), and one as domain controller (Samba) and email (cyrus / postfix). There is also a windows server used for Sage.
We have recently started to notice periodic delays in network traffic. I am not aware of any changes to the network at the time when this problem started - it has been fine for 18 months or so. This has been noticed as follows:
1. remote ssh into Linux server - typing gets momentarily delayed, for about 5-10 seconds, then resumes. This occurs randomly, but typically every minute or so.
2. Remote desktop (to a windows client), freezes, and occasionally drops out.
3. File access to samba shares sometimes takes a very long time, or times out. Then a retry works almost instantly.
4. There is also a DOS (runs in XP command window) based database, whose files reside on a samba share. When scrolling through records, this occasionally freezes, then resumes.
The delay periods are all fairly consistent - ie, they affect the above for roughly the same periods. I have monitored tcpdump from the Linux server whilst accessing the database, and can see server responses from samba to the DOS client, immediately before a delay occurs.
I am certain the servers are not responsible, since it affects services on different machines, and can see samba packets from the server at the right time. Also, I've monitored the broadband traffic, and can see nothing unusual.
The network has gigabit connections from the servers to 3 stacked switches. Fibre between buildings - although the problem affects all buildings. Clients are typically on 100M connections.
the windows clients are all running NOD32 AV. RKhunter and chkrootkit are run on the servers overnight, and show nothing. I'm not sure what else to look for. I've monitored the servers with wireshark, and they look normal.
Any ideas would be welcome, please.
Stuart Bailey BSc (hons) CEng CITP MBCS LinuSoft (Managing Director) Linux Specialist & Software Developer ~~~~~~~~~~~~~~~~~~~~~~~ Phone: (0845) 658 3563 Direct: +44 (0) 1953 878162 Fax: +44 (0) 1603 858583 ~~~~~~~~~~~~~~~~~~~~~~~ http://www.linusoft.co.uk
__________ Information from ESET Mail Security, version of virus signature database 6308 (20110719) __________
The message was checked by ESET Mail Security. http://www.eset.com
On 20/07/11 10:04, Stuart Bailey wrote:
I have a mid sized network, around 80 users. Most of these users are using windows clients, with a few macs. There are 3 Linux servers, 2 for file services (samba and netatalk), and one as domain controller (Samba) and email (cyrus / postfix). There is also a windows server used for Sage.
{} Bit of a wild guess here. I used to get problems with windows machines insisting that they wanted to be the domain master and periodically forcing an election over who was in control. Until the election was over, I'd get a slight delay. To cure it, I had to specifically tell all windows machines that they were clients, and never a master. This was quite a while ago, so I've no idea how I did this now.
On another occasion I had installed an update to samba, that provoked me to run the samba configuration tool, and I got it wrong. Samba was trying to synchronise with a windows domain server, but it couldn't find one. The symptoms of this were initial logins taking an age, then when the synch timed out, then logins for a while would be instant, but after a while, it would try to synch again and logins would have to wait for that time-out.
Good luck - I hope you find the problem.
On 20/07/2011 11:29, steve-ALUG@hst.me.uk wrote:
On 20/07/11 10:04, Stuart Bailey wrote:
I have a mid sized network, around 80 users. Most of these users are using windows clients, with a few macs. There are 3 Linux servers, 2 for file services (samba and netatalk), and one as domain controller (Samba) and email (cyrus / postfix). There is also a windows server used for Sage.
{} Bit of a wild guess here. I used to get problems with windows machines insisting that they wanted to be the domain master and periodically forcing an election over who was in control. Until the election was over, I'd get a slight delay. To cure it, I had to specifically tell all windows machines that they were clients, and never a master. This was quite a while ago, so I've no idea how I did this now.
On another occasion I had installed an update to samba, that provoked me to run the samba configuration tool, and I got it wrong. Samba was trying to synchronise with a windows domain server, but it couldn't find one. The symptoms of this were initial logins taking an age, then when the synch timed out, then logins for a while would be instant, but after a while, it would try to synch again and logins would have to wait for that time-out.
Good luck - I hope you find the problem.
The above is a good idea to check; I've seen that happen as well. Also, check if DNS is an issue; there may be a problem with look-ups.
Cheers, Laurie.
On 20/07/11 12:04, Laurie Brown wrote:
Good luck - I hope you find the problem.
The above is a good idea to check; I've seen that happen as well. Also, check if DNS is an issue; there may be a problem with look-ups.
Cheers, Laurie.
DNS wouldn't particularly explain all the mid session freezes.
Are your switches managed switches....maybe have a poke around the management interface see if there is anything strange going on.
ping's delayed ? dropped packets ?...maybe mtr would help here and you can leave it running for some time pinging a local address and then just check back on the stats later
Can you replicate the problem between any two hosts on the network ? Or does it only happen from Client to Server and if so are all your Server's on one switch ?
Did you notice any heavy broadcast traffic when you were using wireshark...I've had silly things in the past like ethernet enabled printers with every network type known to man enabled in the control panel shout all over the broadcast address trying to announce themselves to clients.
Or two UPnP enabled routers on the same subnet both of which think they are a default gateway...that drives the zeroconf service in Windows a bit mad..but you would have seen that going on in wireshark as it runs on broadcast traffic. Last time I saw that it was when someone had re-purposed an old router as a Wifi AP..turned off DHCP but left it still thinking it was a router.
I've also had the runaround with a confused stack of switches (particularly if they are cheap or a mixture or makes) if you can afford the downtime then reboot them. Or seen a rogue network card on a single host play havoc although again usually you would see broadcast traffic or something going on in wireshark.