Jump to content


Photo

Tons of transmit and receive discards on guest nics

Started by Brent Wiese , 13 April 2010 - 11:16 PM
21 replies to this topic

Brent Wiese Members

Brent Wiese
  • 104 posts

Posted 13 April 2010 - 11:16 PM

Xenserver 5.5 U2 on Dell m600/610 blades (Broadcom nics).

My snmp monitor is showing tons of transmit and receive discards on one of my Windows 2008 guests - over 46000 today. I have a few others over 25k and some around 1k.

All the guests are Windows 2008 sp2 - 2 of them are x64, but the rest are x32. All have the latest tool updates that came with U2.

The one with the largest number of errors didn't yet have the "DisableLargeSendOffload" or "DisableTaskOffload" reg entries. The others all did. Applied/rebooted, no change.

None of the physical switch ports are showing any errors/drops. None of the ethX/bondX interfaces on the blade are showing any drops/errors.

Some of the vifX.0 interfaces show some drops, but the worst is 342 on transmit and it's not changing.

The guest with the 46k is barely pushing any traffic - the MAX today was 20kbps receive and 19.5 transmit, so certainly not a utilization problem.

I haven't disabled the rx/tx offload on the physical nics, but based on there not being any errors on those interfaces, I'm not sure that matters.

Any ideas what could be causing this?

---ADDED:
Realized every one of my Windows 2008 VM's is doing this. None of my 2003 vms do it.

Edited by: Brent Wiese on Apr 14, 2010 12:32 PM



James Cannon Members

James Cannon
  • 3,857 posts

Posted 14 April 2010 - 10:32 PM

Hi Brent,

How about running WireShark inside the Win2k8 VMs? Would be nice to know what is going on. Disabling tx or rx on the pif would turn off or on flow control. Please compare to your switch port configuration first before modifying.

Regards,
James Cannon



Brent Wiese Members

Brent Wiese
  • 104 posts

Posted 15 April 2010 - 08:28 PM

But if flow control on the pif was the problem, wouldn't I see errors/discards on the switch port? Or errors on the pif's ifconfig?

I'll fire up wireshark and see if I see anything useful to report.



James Cannon Members

James Cannon
  • 3,857 posts

Posted 16 April 2010 - 05:27 PM

Hi Brent,

You probably won't see errors or discards on the PIF on the XenServer, unless XenServer was discarding the packets. What you will see, are probably a lot of retransmits.

I've heard that a reg edit to Win2k8 to turn off Large Send Offload is helpful. May want to test on a non-production VM.

Regards,
James Cannon



Brent Wiese Members

Brent Wiese
  • 104 posts

Posted 16 April 2010 - 09:15 PM

I've disabled the LSO (was buried in my original post) inside Windows.

Haven't had a chance to fire up wireshark yet.



Karl Fallon Members

Karl Fallon
  • 6 posts

Posted 19 April 2010 - 01:44 PM

I get the same issue on my Windows 2008 VM's - I have a physical box I built that has no discards, and on the Windows 2003 servers which I am replacing, they have none either. It's only on a VM Windows 2008.

C:\>netstat -e
Interface Statistics

Received Sent

Bytes 3279485874 1076014923
Unicast packets 938705193 902923889
Non-unicast packets 2541866 144982
Discards 2308074 12308074
Errors 0 0
Unknown protocols 0

Edited by: Karl Fallon on 20-Apr-2010 07:13



James Cannon Members

James Cannon
  • 3,857 posts

Posted 22 April 2010 - 04:29 PM

Hi Karl,

For Windows 2008 please set the DisableTaskOffload key.

Regards,
James Cannon



Karl Fallon Members

Karl Fallon
  • 6 posts

Posted 23 April 2010 - 02:28 PM

Hi James,

Thanks for your tip. I have seen that information causing issues on another board. I tried that suggestion - The Key is set on HKLM\....\Services\TCPIP\Parameters\DisableTaskOffload = 1

I'm not using Hyper-V so maybe this is different to the other users who had the same issue.

Any other suggestion?

Regards,
Karl.

This is what I did.

* After a reboot you can see the discards appearing
* Using netsh you can see the taskoffload is enabled
* After the reg setting it's disabled.

C:\Users\Administrator>netstat -e
Interface Statistics

Received Sent

Bytes 220614 693708
Unicast packets 1622 2496
Non-unicast packets 312 72
Discards 1192 1192
Errors 0 0
Unknown protocols 0

BEFORE

* The netsh shows the IP offload when it is enabled on Interface 11

C:\Users\Administrator>netsh int ip show offload

Interface 1: Loopback Pseudo-Interface 1

Interface 11: Local Area Connection 2

ipv4 transmit checksum supported.
udp transmit checksum supported.
tcp transmit checksum supported.
tcp large send offload supported.
udp receive checksum supported.
tcp receive checksum supported.

AFTER

* The netsh shows the IP offload is not enabled on Interface 11, normally there is info there.

C:\Users\Administrator>netsh int ip show offload

Interface 1: Loopback Pseudo-Interface 1

Interface 11: Local Area Connection 2

- The Discards are still appearing -

C:\Users\Administrator>netstat -e
Interface Statistics

Received Sent

Bytes 118880 420646
Unicast packets 634 1272
Non-unicast packets 232 74
Discards 642 642
Errors 0 0
Unknown protocols 0

Edited by: Karl Fallon on 23-Apr-2010 10:29

Edited by: Karl Fallon on 23-Apr-2010 10:31



James Cannon Members

James Cannon
  • 3,857 posts

Posted 23 April 2010 - 05:02 PM

Hi Karl,

FYI - you shouldn't use hyper-V in a VM. Having a hypervisor on a hypervisor is bad. Maybe you can provide details on the switch port? Please enable portfast if using Spanning Tree Protocol. Perhaps turn off STP completely ... for testing purposes. If you are not using VLAN tagging on the XenServer, please turn off trunk port mode.

Regards,
James Cannon



Karl Fallon Members
  • #10

Karl Fallon
  • 6 posts

Posted 27 April 2010 - 02:29 PM

James,

I'm not using Hyper-V at all, I was mentioning that some people cured their packets discards on Hyper-V. I'm using Xen with a guest O/S of Windows 2008. I get you point though.

This is the information I got from our network team..

Please find the configuration.
spanning-tree portfast is configured on port.

interface GigabitEthernet3/10
description *** To KDDI-FJ-BL02-SW1 Port3 ***
switchport
switchport trunk native vlan 102
switchport trunk allowed vlan 101,102
switchport mode trunk
speed 1000
duplex full
spanning-tree portfast edge trunk
channel-protocol lacp
channel-group 13 mode active

We are using a VLAN tag



Thomas Wieckhorst Members
  • #11

Thomas Wieckhorst
  • 553 posts

Posted 27 April 2010 - 04:01 PM

Hello Karl

> {quote:title=kfallon wrote:}{quote}
> James,
>...
> channel-protocol lacp
>...

I see LACP here. Are you use bonded NICs to your XenServer?
Maybe this could be part of your Problem, as XenServer does not support LACP as bonding mode.
It can be configured manualy on XenServer, but until you have done that you better disable LACP on your Switch Port. I'm not sure if this is part of your actual problem, but i think better hve an eye on that to.

Regards
Thomas Wieckhorst
Roehrs AG
Germany



Karl Fallon Members
  • #12

Karl Fallon
  • 6 posts

Posted 28 April 2010 - 03:01 PM

Hi Thomas,

We are using bonded NICs and I will have a look at your suggestion.

Thanks
Karl.



Brent Wiese Members
  • #13

Brent Wiese
  • 104 posts

Posted 30 April 2010 - 10:55 PM

We are using bonded nics also attached to Cisco switches.

I had to put the bonds in active-passive mode because we were losing traffic in active-active. Not sure how the cisco switch ports need to be configured to allow active-active, but doesn't look like they're happy "out of the box" in that respect.

In any case, I see on the physical side that traffic is only traversing 1 link.

We have a few different pools, some are set for trunking/hypervisor tagging, others only belong to a single vlan and "access mode" on the switch port with no hypervisor tagging.

Edited by: Brent Wiese on Apr 30, 2010 6:56 PM



Clay Schomburg Members
  • #14

Clay Schomburg
  • 11 posts

Posted 05 May 2010 - 03:54 PM

I am having the same issues, did you ever fix your issues?



Donald Jenkins Members
  • #15

Donald Jenkins
  • 5 posts

Posted 06 May 2010 - 06:52 PM

We are seeing the same exact behavior. Anyone successful in identifying a fix for this yet?



Brent Wiese Members
  • #16

Brent Wiese
  • 104 posts

Posted 06 May 2010 - 11:17 PM

No solution yet.

I have a pool that is NOT using bonded nics and is exhibiting the same behavior. Similar setup tho - Dell blade, broadcom nic, cisco switch.

Have we grabbed Citrix's attention yet? This obviously is a bigger than 1 person...



Edgar Somosierra Members
  • #17

Edgar Somosierra
  • 2 posts

Posted 25 June 2010 - 01:52 AM

I'm also having this issue right now. Were any updates to fix?

Thanks.



SOKUM KEO Members
  • #18

SOKUM KEO
  • 43 posts

Posted 20 September 2010 - 07:43 PM

I'm adding my name to this list also...I'm experiencing the exact same issue with the Windows 2008 R2 x64 guest running on XS5.6.

The first Win2008 R2 server ran without a problem, once I had the second one fired up these issues started appearing. The VM keeps dropping off and on making it seemingly impossible to use in production. My Wind2003 however have not shown any problem.

All VM have the latest XS_Tools.

I have not tried the offload reg entries yet as we are using these VM. I will try it once I get a chance. Also I tried using the License Server VM (downloaded from Citrix) it is also having the exact same problem. Being a Linux VM I'm not too familiar with the product other then the web page.

Help...



Leslie Nicholson Citrix Employees
  • #19

Leslie Nicholson
  • 47 posts

Posted 30 September 2010 - 04:40 PM

Just wanted to update this thread with a response to the problem posted.

First WinXP and Win2K3 have different network stacks from WinVista, Win7 and Win2K8 xx. With the Xentools the PV driver is different between the two network stacks. With the former Discards packets are not even reported to the OS thus the reason its remains at zero. With the latter, discards are reported and sent to the OS and thus detected by monitoring systems if the agents are configured to do so.

With that said, keep in mind that discards are simply that, discarded packets. Discards are different from drops. When a packet is discarded on a TCP/IP wire, it is retransmitted again if needed. Only impact this causes is bandwidth degradation and only noticed if retransmits occur VERY frequently. Whereas a drop the packet is dropped for many reasons and never retransmitted. A drop is seen when you perform a ping and a drop occurs. Discards, usually do not show a dropped packet.

In the case of XenServer and Windows 2K8, Win7 (I know it was never mentioned but it applies as well) and 2K8 R2 VMs these discarded packets are caused by a busy network which it is connected to. What the VM is doing is discarding broadcast traffic on the wire (or Dom0 bridge between VMs) that isn't meant for them to prevent unnecessary traffic from utilizing the hypervisor network stack. To prove that out, I suggest borrowing a VM with Win2K8 xx or Win7/WinVista or creating a new one and using a network on the hypervisor that is empty. In my case I used a Storage NIC and added a virtual network of VLAN 1 which isn't used per Cisco best practices. On VM boot up and for several minutes after-wards, there were no discards present via netstat -e.



Randal Potratz Members
  • #20

Randal Potratz
  • 1 posts

Posted 13 October 2010 - 03:44 PM

The fix we had for this was to disable TCP/IPv6 in the adapter properties in Server 2008 R2, as well as Win 7. Hope this helps someone.



Give Us feedback