Jump to content


Photo

One VLAN between pool members not working..

Started by Marcus Peterson , 20 April 2017 - 07:08 PM
10 replies to this topic

Marcus Peterson Members

Marcus Peterson
  • 8 posts

Posted 20 April 2017 - 07:08 PM

I am having a strange networking issue between pool members.. I have a xenserver 7 pool with a few networks. Currently all of my VMs are running on xen1.

 

I noticed today that VMs running on xen2 cannot communicate with other VMs on xen1. After further investigation I found that this issue only effects one specific VLAN. All other networks function without issue between pool members.

 

The gateway for all networks is a VM running on xen1.

 

Testing from a VM on xen2 I can see ARP broadcasts destined for its gateway on the troublesome VLAN leaving the host > ingress and egress xen2 > ingress xen1 but never egress > nothing hits the firewall VM on xen1..

 

The same is true in reverse ARP broadcasts from the firewall VM on xen1 leave the host > ingress and egress xen1 > ingress xen2 but never egress > nothing hits the VM on xen2..

 

What's odd is communication for all other networks between pool members seem to work without issue.

 

Can someone please help me with this situation?

 

Please let me know if you'd like more information.

 

Thanks!

--Marcus



Marcus Peterson Members

Marcus Peterson
  • 8 posts

Posted 20 April 2017 - 10:43 PM

UPDATE: I mucked around with the network interfaces on the test VM on xen2. Swapped the interfaces and which network they're on. Now the VLAN network that wasn't able to communicate works without issue. And another network that had no issue before is having the problem.. This is very weird..

 

--Marcus



Alan Lantz Members

Alan Lantz
  • 6,989 posts

Posted 21 April 2017 - 01:57 AM

Some sort of looping going on? Since its hopping around that is what it sounds like.

 

--Alan--



Marcus Peterson Members

Marcus Peterson
  • 8 posts

Posted 24 April 2017 - 06:47 PM

Hello Alan,

 

Thank you for your response.

 

What do you mean by looping? Do you mean loops in my switch configuration between the servers?

 

I look forward to your response.

 

Thanks,

--Marcus



Alan Lantz Members

Alan Lantz
  • 6,989 posts

Posted 24 April 2017 - 06:55 PM

When I posted that I was thinking of some type of VLAN looping so the VLAN was getting shut down since it appears to be only one VLAN having the issue.

 

--Alan--



Marcus Peterson Members

Marcus Peterson
  • 8 posts

Posted 24 April 2017 - 07:00 PM

I see. I will need to look into that. Any tips on troubleshooting that?

 

I just noticed this:

 

Start a test VM on xen2 and verify the network problem is happening. Migrate the VM to xen1, ARP entries look good and can ping gateways without issue on both interfaces. Migrate the test VM back to xen2 and the issue is resolved as the ARP entries are still current.

 

Any thoughts on this? I am concerned about this issue.

 

Thanks again,

--Marcus



Marcus Peterson Members

Marcus Peterson
  • 8 posts

Posted 25 April 2017 - 06:16 PM

I'm really at a loss here. The only thing I can think to try would be to power cycle the xen pool. Does anyone have any idea what may be causing this issue? I am skeptical about the idea that this is related to VLAN looping however I am open to testing it.

 

Can someone please assist me with this?

 

Thanks,

--Marcus



Marcus Peterson Members

Marcus Peterson
  • 8 posts

Posted 25 April 2017 - 06:33 PM

I found the following documentation: https://support.citrix.com/article/CTX132559

 

According to this document for an active-active xenserver configuration it requires stacked switches. My switch configuration is not stacked. I have two daisy-chained L2 switches. With that being said I now understand that an active-active configuration will not work properly. Is this why I am having this issue?

 

My question now is will an active-standby configuration work properly with my switch configuration?

 

Outside of the communication issue between VMs across hypervisors everything works fine.

 

I look forward to your response.

 

Thanks,

--Marcus



Alan Lantz Members

Alan Lantz
  • 6,989 posts

Posted 25 April 2017 - 11:54 PM

active/active can be tricky and usually active/passive works with better results in that case. That being said there are reports that XenServer 7 with certain nics actually work better in active/active than active/passive. I discounted that earlier due to the fact that I understood that it was only one vlan that had issues and other vlans worked fine. I would expect all vlans to have issues if it was active/active or active/passive issue.

 

--Alan--



Marcus Peterson Members
  • #10

Marcus Peterson
  • 8 posts

Posted 26 April 2017 - 06:46 PM

Hello Alan,

 

Thank you for your thoughtful response. To clarify are you saying that this issue is related to my xen pool being active/passive? Or do you think something else is going on here?

 

Thanks again,

--Marcus



Alan Lantz Members
  • #11

Alan Lantz
  • 6,989 posts

Posted 26 April 2017 - 07:34 PM

Could very well be the mode, you can switch it to see if it makes any difference.

 

--Alan--