On Mon, 15 Jul 2002 07:56:57 Ian Douglas wrote:
Looking at eth0 there is a transmit queue of 100 so it looks like the kernel is routing packets to the right interface but the interface is not sending them.
Confusingly the Transmit Queue always seems to show 100 even when I check it immediately after the machine has just booted. Also it does not appear to increment any higher if I do a ping. Also the "TX packets" and "RX packets" count remain at zero. In fact none of the numbers in the "eth0" part of the ifconfig output change at all even after I do repeated pings from or to my new PC.
You could have a point here, but I think there are other explanations for what you describe. Probably the queue length is not zero when you first look at it because something has already tried transmitting, e.g. to resolve a DNS name, and doesn't go over 100 because this is probably the limit to how long the queue can grow - after that the sending program will get bloked until there is space on the queue.
I did however make an interesting discovery last night. I ran tcpdump on the remote machine (rather than on my new PC as I had been doing to that point) and saw:
22:44:16.990442 arp who-has suzy.suseland.net tell chocchip.suseland.net 22:44:16.990734 arp reply suzy.suseland.net is-at 0:1:2:d7:f8:80 22:44:16.990494 arp who-has suzy.suseland.net tell chocchip.suseland.net 22:44:16.990842 arp reply suzy.suseland.net is-at 0:1:2:d7:f8:80 22:44:16.990558 arp who-has suzy.suseland.net tell chocchip.suseland.net 22:44:16.990936 arp reply suzy.suseland.net is-at 0:1:2:d7:f8:80 22:44:16.990623 arp who-has suzy.suseland.net tell chocchip.suseland.net 22:44:16.991030 arp reply suzy.suseland.net is-at 0:1:2:d7:f8:80
So it looks like my new PC is in fact transmitting the ping (and is being sent a reply) even though it thinks it is not.
Yes, if the problem is a missed interrupt this is exactly what I would expect to see.
Does this explain the following repeating error message I mentioned in a previous post which is filling up /var/log/messages?
Jul 13 22:59:52 chocchip kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 13 22:59:52 chocchip kernel: eth0: Tx queue start entry 4 dirty entry 0. Jul 13 22:59:52 chocchip kernel: eth0: Tx descriptor 0 is 00002000. (queue head) Jul 13 22:59:52 chocchip kernel: eth0: Tx descriptor 1 is 00002000. Jul 13 22:59:52 chocchip kernel: eth0: Tx descriptor 2 is 00002000. Jul 13 22:59:52 chocchip kernel: eth0: Tx descriptor 3 is 00002000. Jul 13 22:59:52 chocchip kernel: eth0: Setting half-duplex based on auto-negotiated partner ability 0000.
Yes. In order to avoid problems with cards locking up most drivers implement a timer. if the card does not raise an interrupt to signal it has finished transmitting after a period of time the watchdog time goes off and the driver treats the transmit attempt as a failure. It may retry that transmit or it may simply try to transmit the next packet.
My output from cat /proc/interrupts is:
CPU0
0: 39797 IO-APIC-edge timer 1: 339 IO-APIC-edge keyboard 2: 0 XT-PIC cascade 8: 2 IO-APIC-edge rtc 12: 2107 IO-APIC-edge PS/2 Mouse 14: 6366 IO-APIC-edge ide0 15: 124 IO-APIC-edge ide1 18: 0 IO-APIC-level eth0 20: 0 IO-APIC-level usb-ohci 23: 0 IO-APIC-level usb-ohci NMI: 0 LOC: 39750 ERR: 0 MIS: 0
If I dual-boot into Microsoft Windows (which can transmit / receive to and from the network ok) and check Control Panel I notice the network card is assigned interrupt 11 rather than the interrupt 18 that seems to have been assigned to it under Linux. I am confused. I did not realise PC interrupts went as high as 18. Is there some way I can force Linux to assign it interrupt 11 so I can give it a go with this and see if it makes a difference?
This is pretty much on the limit of my knowledge. Like you I had only previously seen interrupt numbers up to 15. It is also interesting to note that 18,20 and 23 are level triggered whilst all the others are edge triggered so this may cause problems too.
I think you may be able to set the interrupt with the setpci command. if so you would the card driver to be a module and would need to do the setpci in the boot sequence before the driver got loaded.
Steve.