On Monday, July 15, 2002 2:39 AM, Steve Fosdick wrote:
On Sat, 13 Jul 2002 23:35:56 Ian Douglas, having no luck pinging from the ethernet card in his new PC wrote:
If I run ifconfig I get:
eth0 Link encap:Ethernet HWaddr 00:10:DC:1F:95:78 inet addr:192.168.3.70 Bcast:192.168.255.255 Mask:255.255.0.0 inet6 addr: fe80::210:dcff:fe1f:9578/10 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:18 Base address:0x6000
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:503 errors:0 dropped:0 overruns:0 frame:0 TX packets:503 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:49204 (48.0 Kb) TX bytes:49204 (48.0 Kb)
The thing that immediately strikes me is that it is the RX and TX packet count on lo rather than eth0 that is incrementing after each ping.
Unless I could see a 1:1 correspondance between packets counted for interface 'lo' and the pings you do I would assume that these incrementing is just coincidence.
The "lo" interface RX/TX count definitely increases after each ping but not 1:1 so, as you say, I guess this may be a red herring.
Looking at eth0 there is a transmit queue of 100 so it looks like the kernel is routing packets to the right interface but the interface is not sending them.
Confusingly the Transmit Queue always seems to show 100 even when I check it immediately after the machine has just booted. Also it does not appear to increment any higher if I do a ping. Also the "TX packets" and "RX packets" count remain at zero. In fact none of the numbers in the "eth0" part of the ifconfig output change at all even after I do repeated pings from or to my new PC.
I did however make an interesting discovery last night. I ran tcpdump on the remote machine (rather than on my new PC as I had been doing to that point) and saw:
22:44:16.990442 arp who-has suzy.suseland.net tell chocchip.suseland.net 22:44:16.990734 arp reply suzy.suseland.net is-at 0:1:2:d7:f8:80 22:44:16.990494 arp who-has suzy.suseland.net tell chocchip.suseland.net 22:44:16.990842 arp reply suzy.suseland.net is-at 0:1:2:d7:f8:80 22:44:16.990558 arp who-has suzy.suseland.net tell chocchip.suseland.net 22:44:16.990936 arp reply suzy.suseland.net is-at 0:1:2:d7:f8:80 22:44:16.990623 arp who-has suzy.suseland.net tell chocchip.suseland.net 22:44:16.991030 arp reply suzy.suseland.net is-at 0:1:2:d7:f8:80
So it looks like my new PC is in fact transmitting the ping (and is being sent a reply) even though it thinks it is not.
One possibility is the the driver for the card never receives interrupts from the card. Normally an interrupt is used to signal a packet has arrived but also to signal that a transmit has finished so if the interrupt never arrives the driver will never beleive the first packet in the queue was sent and will never send the second one.
Does this explain the following repeating error message I mentioned in a previous post which is filling up /var/log/messages?
Jul 13 22:59:52 chocchip kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 13 22:59:52 chocchip kernel: eth0: Tx queue start entry 4 dirty entry 0. Jul 13 22:59:52 chocchip kernel: eth0: Tx descriptor 0 is 00002000. (queue head) Jul 13 22:59:52 chocchip kernel: eth0: Tx descriptor 1 is 00002000. Jul 13 22:59:52 chocchip kernel: eth0: Tx descriptor 2 is 00002000. Jul 13 22:59:52 chocchip kernel: eth0: Tx descriptor 3 is 00002000. Jul 13 22:59:52 chocchip kernel: eth0: Setting half-duplex based on auto-negotiated partner ability 0000.
One thing to check for is if anything else is sharing the interrupt. Some drivers can't do interrupt sharing (though most can). Try running cat /proc/interrupt and maybe post the output from that too.
My output from cat /proc/interrupts is:
CPU0 0: 39797 IO-APIC-edge timer 1: 339 IO-APIC-edge keyboard 2: 0 XT-PIC cascade 8: 2 IO-APIC-edge rtc 12: 2107 IO-APIC-edge PS/2 Mouse 14: 6366 IO-APIC-edge ide0 15: 124 IO-APIC-edge ide1 18: 0 IO-APIC-level eth0 20: 0 IO-APIC-level usb-ohci 23: 0 IO-APIC-level usb-ohci NMI: 0 LOC: 39750 ERR: 0 MIS: 0
If I dual-boot into Microsoft Windows (which can transmit / receive to and from the network ok) and check Control Panel I notice the network card is assigned interrupt 11 rather than the interrupt 18 that seems to have been assigned to it under Linux. I am confused. I did not realise PC interrupts went as high as 18. Is there some way I can force Linux to assign it interrupt 11 so I can give it a go with this and see if it makes a difference?
Please excuse my ignorant ramblings in this post but I freely admit I haven't much a clue what I am talking about!
Ian.