Jump to content


Photo

Xen VLAN problem - possibly a bug?

Started by Nikolay Popov , 24 April 2009 - 12:20 AM
20 replies to this topic

Nikolay Popov Members

Nikolay Popov
  • 32 posts

Posted 24 April 2009 - 12:20 AM

Hello

I'm trying to make some vlans avaialble for my domUs but without success
Here is an configuration:

dom0: XenServer 5.0.0-13192p Product SKU Citrix XenServer (it's XenServer 5.0.0 hotfix 3)
kernel 2.6.18-92.1.10.el5.xs5.0.0.426.647xen

I have 1 NIC (Intel Corporation 82557/8/9 Ethernet Pro 100 (rev 10) - e100 dirver

nic is connected to cisco catalyst 2924
here is an port configuration:
!
interface FastEthernet0/17
switchport trunk encapsulation dot1q
switchport trunk native vlan 26
switchport mode trunk
spanning-tree portfast
no cdp enable
!

I created 1 extra network interface in xencenter and assign vlan id 32 to it and create VM with this interface
vlan id 32 is up and running at catalyst at well

In result I am unable to ping hosts from vlan 32 into new created VM. just 0 packets rx, no arps, nothing at all.

at dom0:

[root@xen02 ~]# brctl show
bridge name bridge id STP enabled interfaces
xapi1 8000.010e0c6ee299 no vif1.0
eth0.32
xenbr0 8000.000e0c6ee299 no eth0

What is confusing me - I can see traffic in vlan 32 with ~# tcpdump -npi eth0 vlan 32
but no traffic at eth0.32 at all!
I checked all 10 times and can't find what I'm doing wrong.
Tagged packet is on the wire, but XenServer just don't hear it.

If I delete xenbr0, leaving eth0 alone without bridging, I am able to hear traffic at eth0.32, but VM still not accessible until I set eth0 to promisc mode (just by setting tcpdump -ni eth0). Once I shutdown tcpdump, VM became not accessible again.

Looks like something is broken here, but I don't know what.. Any suggestions are welcome!

Regards, Nikolay



Simon Dyer Members

Simon Dyer
  • 7 posts

Posted 24 April 2009 - 08:42 AM

H Nicolay,

This sounds suspiciously similar to my issue under the thread "Bridge problems with VLANs on second NIC". I have 2 NICs, not one, and the VLANs are on the second one. Unless I manually remove eth1 from xenbr1 nothing works. When I do that, it does.

To see if this is the same issue, can you do a tcpdump -ni xenbr0 -XX and post the result? If it is the same, you should see a frame that goes:

ff ff ff ff ff ff nn nn nn nn nn nn 81 00 00 20
08 06 bla bla bla ...

where nn is the MAC of the router or server outside the xenserver. This means that the VLAN tagged frame (81 00 00 20 for vlan 32) is going to xenbr0 instead of going to the eth0.32 interface. The frame I have used as an example is an ARP request frame which is the most probable if you do a ping from outside towards the VM.

I am running exactly the same code as you, XenServer 5.0.0-13192p, kernel 2.6.18-92.1.10.el5.xs5.0.0.426.647xen.

Regards,

Simon.



Nikolay Popov Members

Nikolay Popov
  • 32 posts

Posted 24 April 2009 - 08:58 AM

Yes, I can confirm that tagged frames are seen in xenbr0. as far as all vlans allowed on this trunk, I can see a lot of

11:49:50.272969 arp who-has 10.5.13.117 tell 10.5.1.36
0x0000: ffff ffff ffff 000e 2e9e 3bbf 8100 000a
11:49:50.317898 arp who-has 10.3.26.1 tell 10.3.26.145
0x0000: ffff ffff ffff 001b 3821 e175 8100 0012
11:49:50.346405 arp who-has 10.4.147.17 tell 10.4.145.104
0x0000: ffff ffff ffff 001b fc18 bf6d 8100 0004

and so on

But why not? If it's bridge, eth0 is in promisc mode and capture this frames, and bridge just doing it's job trying to retranslate them to another ports
Problem is, that frames are not detagging if bridge is active at eth0 - why? Maybe, it's NIC driver issue?



Nikolay Popov Members

Nikolay Popov
  • 32 posts

Posted 24 April 2009 - 09:48 AM

Also /proc/sys/net/bridge doesn't exist. I have this directory on all hosts running bridges, but not on this xen one. Maybe it's related somehow?



Nikolay Popov Members

Nikolay Popov
  • 32 posts

Posted 24 April 2009 - 04:12 PM

Also tried with linux domU - same result ;(



Nikolay Popov Members

Nikolay Popov
  • 32 posts

Posted 24 April 2009 - 05:01 PM

It looks like vlan/bridging code is totally broken here.
I created another linux VM, connect it to management interface then set tcpdump inside and can hear all broadcast frames from all vlans at trunk! xenbr0 just pass tagged traffic to all it's ports without filtering, also it don't mention that another vlans are already created at eth0.

As far as I can understand, bridge.ko has been heavy patched by citrix team. possibly they broke something?

Can somebody from Citrix say are VLANs supported at XenServer Free Edition or not? If not - why it is possible to set them into XenCenter? If yes - why they don't work as expected at all?



Simon Dyer Members

Simon Dyer
  • 7 posts

Posted 24 April 2009 - 09:46 PM

It sure looks like we are hitting the same problem here, although I have 2 NICS, one for management with no VLANs and the second one with all the VLANs. Xencenter creates the "untagged" bridge xenbr1 for this one, though I have no use for it, but thats ok. My VLAN NIC is a Realtek RTL-8139C using the 8139too module.

Re: your comments:

If there is no use of VLANs, then 802.1Q is just another protocol ID, so agreed, they would bridge just fine in theory unless the bridge explicitly blocks that protocol ID. A real switch will block tagged frames on an untagged port for example.

Ha! Looking at your next post answers the last question. On another box I am doing bridging on, that dir exists:
root@not-a-xen-box:~# cat /proc/sys/net/bridge/bridge-nf-filter-vlan-tagged
1

though that looks more related to netfilter as-applies-to-bridging rather than strictly to bridging. The xen machine doesn't have it. Seems the default bridge behaviour is to filter out tagged frames.

I'm convinced it's in the dom0 code, so win/lin VMs doesn't matter. If I can't dump it from the xapi bridge it won't make it to the VM.

I hadn't tried putting the VM onto the xenbr'x' and dumping from inside the VM, though it confirms what we are seeing. I agree that it is bridging the tagged frames instead of detagging them, so apparently completely broken, agreed. In my case the workaround is easy: brctl delif xenbr1 eth1 which has no side effects for me, though you would lose the management interface. My problems are 1) Its wrong. 2) I can fix it but it comes back at boot. I could fix that with a startup script, but 3) Its wrong.

I would ask the same question to Citrix. Should this work? The fact I can fix it seems to indicate its more a bug than a feature restriction, but hey.



Nikolay Popov Members

Nikolay Popov
  • 32 posts

Posted 24 April 2009 - 11:18 PM

Simon, we going same way, but it looks that I have found a solution. At least, it works for me.

Install ebtables on dom0 and perform

ebtables -t broute -A BROUTING -p 802_1Q -i eth0 -j DROP

This should prevent bridge code to process tagged frames so they may pass to 802.1q code
Special thanks to Citrix - looks like those guys have never tested vlans with their brigdes ;)

p.s. - if you don't know how to install ebtables - take in attention that xensource is based on centos5. so fix /etc/yum.repos.d/CentOS-Base.repo and use yum at well
after you get things to work with ebtables don't forget to chkconfig --add ebtables; chkconfig ebtables on; to keep this voodoo magic on reboots

Regards, Nikolay



Stefan Hagl Members

Stefan Hagl
  • 21 posts

Posted 04 May 2009 - 08:16 AM

hmm this seems to by my problem too:
http://forums.citrix.com/thread.jspa?threadID=245577&tstart=0

could you please post how to install ebtables? i never worked on centos and i don't know yum?



Doreen Nigbur Members
  • #10

Doreen Nigbur
  • 3 posts

Posted 05 June 2009 - 02:30 PM

Hey,

I was wondering how to install ebtables without any available package on centos5 repositories. Could you guys give me a hint?

Regards, Rick



Nikolay Popov Members
  • #11

Nikolay Popov
  • 32 posts

Posted 05 June 2009 - 02:43 PM

You need to use epel repository. Howto is searchable via google in 5 min ;)



Kevin Brooks Members
  • #12

Kevin Brooks
  • 29 posts

Posted 02 November 2009 - 11:54 PM

I wanted to save anyone else who runs into this problem some time. Here's the final steps needed to resolve this issue:

Enable the epel repo:
rpm -ivh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm

Install ebtables:
yum install ebtables

Install the following filter (replace eth1 with the interface you're using for your VM VLANs):
ebtables -t broute -A BROUTING -p 802.1Q -i eth1 -j DROP

Enable ebtables at startup:
chkconfig --levels 2345 ebtables on

Check the current "broute" table rules:
ebtables -t broute --list

Save the tables for next start
/etc/init.d/ebtables save



Kevin Brooks Members
  • #13

Kevin Brooks
  • 29 posts

Posted 04 November 2009 - 07:51 AM

As a follow up this doesn't seem to be a problem when using bonded interfaces for the VLANs.



Danny Wannagat Citrix Employees
  • #14

Danny Wannagat
  • 54 posts

Posted 04 November 2009 - 10:51 AM

Experimental only! Not Supported!

Hi.

You could also replace the network / bridging drivers on xenserver with:
http://openvswitch.org/

This integrates perfectly into xenserver. It's not final, but I tested the 0.90.6 and it looks good so far.

If you have problems compiling the stuff, send me an email and I will send you the RPM.

Danny



Ben Getsug Members
  • #15

Ben Getsug
  • 1 posts

Posted 13 November 2009 - 06:08 PM

I just wanted to confirm that these steps worked for me. Thank you!
Had I not found this post, I would have had to go with ESXi. I hope Citrix will fix this in the next version.



Serguei Fedotov Members
  • #16

Serguei Fedotov
  • 17 posts

Posted 27 November 2009 - 11:05 AM

I have the same problems <a href="http://forums.citrix.com/thread.jspa?messageID=1420838&" class="jive-link-external">http://forums.citrix.com/thread.jspa?messageID=1420838&</a><br />
and after appying this solution it (I hope;) fixed



Mark Soentges Members
  • #17

Mark Soentges
  • 2 posts

Posted 20 April 2010 - 12:04 AM

Hi there,

*many thanks* to Kevin Brooks and all others for researching the solution to this nasty VLAN problem, you saved my day (or rather: my week)!

I had the same problem on 3 fresh XenServer 5.5.0 hosts where eth0 is iSCSI only and eth1 is for the management interface and (VLAN tagged) VM communication: absolutely no VM network communication over the tagged IF.

I extensively tested these hosts at home before setting them up in the data center for production use, but before I added the additional NIC in order to separate iSCSI network traffic. I thought I was well-prepared, I'd never have expected that adding the extra NIC can cause that much trouble.

I hope there are not more surprises waiting for me, as I'm setting up these hosts in the data center tomorrow.

Mark



citrix visor Members
  • #18

citrix visor
  • 112 posts

Posted 23 April 2010 - 05:48 PM

Danny ,

can you elaborate how to replace xen bridge networking driver by openswitch ?

thanks



Serguei Fedotov Members
  • #19

Serguei Fedotov
  • 17 posts

Posted 03 June 2010 - 02:05 PM

this methon not work with new XenServer 5.6

[root@xenserver-56 ~]# ebtables -t broute -A BROUTING -p 802.1Q -i eth1 -j DROP
FATAL: Module ebtables not found.
The kernel doesn't support the ebtables 'broute' table.

[root@xenserver-56 ~]# ebtables -L
FATAL: Module ebtables not found.
The kernel doesn't support the ebtables 'filter' table.

Any help appreciated.



Piete Brooks Members
  • #20

Piete Brooks
  • 14 posts

Posted 26 March 2011 - 02:02 PM

I get similar problems for 5.6.1-fp1 (AKA 5.6 FP1 AKA 5.6.100-fp1) on two AMD dom0s (although all the other AMD and Intel dom0s are fine)