NetScaler HA Sync Failures and Split VIPs Caused by Proxy ARP on Upstream Devices
Issue overview
In a Citrix NetScaler / Citrix ADC High Availability (HA) pair, failover and sync can become unstable when Proxy ARP is enabled on upstream switches or firewalls (for example Cisco devices).
The result is incorrect ARP resolution, failed HA synchronization and “split” VIP behaviour where both nodes appear to own the same virtual IPs after failover.
Symptoms
When this problem occurs, you may see:
HA synchronization state shows FAILED.
Intermittent or broken communication between HA nodes.
After failover, some VIPs remain active on the secondary node and traffic is inconsistently routed between nodes.
ARP anomalies:
NetScaler nodes resolve peer or VIP IPs to the MAC address of the upstream Cisco device.
Both HA nodes learn the same switch / firewall MAC instead of each other’s MAC.
The upstream device effectively “owns” MAC addresses for NetScaler‑related IPs.
For clients this shows up as intermittent connectivity, session drops during failover and generally unstable traffic paths.
Why HA depends on clean ARP
NetScaler HA relies on:
Correct ARP resolution between both nodes.
Proper propagation of Gratuitous ARP (GARP) during failover.
Direct Layer‑2 adjacency between the appliances.
If ARP learning is corrupted, node‑to‑node communication and VIP ownership become unreliable, which directly impacts HA stability and traffic flow.
Why HA depends on clean ARP
NetScaler HA relies on:
Correct ARP resolution between both nodes.
Proper propagation of Gratuitous ARP (GARP) during failover.
Direct Layer‑2 adjacency between the appliances.
If ARP learning is corrupted, node‑to‑node communication and VIP ownership become unreliable, which directly impacts HA stability and traffic flow.
Root cause: Proxy ARP on the upstream
The issue appears when Proxy ARP is enabled on upstream network devices.
Instead of allowing the peer NetScaler to respond to ARP requests, the Cisco switch or firewall answers on its behalf and returns its own MAC address.
Observed ARP behaviour
In the affected setup:
On Node 1 (primary), the ARP table shows the Cisco MAC address for the peer or VIP IPs.
On Node 2 (secondary), the ARP table shows the same Cisco MAC.
Neither node learns the actual MAC address of its HA peer.
Expected vs. actual
Expected ARP entries
Actual (With Proxy ARP Enabled)
What happens internally
A NetScaler node sends an ARP request for its HA peer.
The peer should answer with its own MAC address.
Instead, the Cisco device with Proxy ARP enabled responds on behalf of that IP and returns its own MAC.
Both NetScaler nodes cache the Cisco MAC instead of each other’s MAC.
This effectively breaks direct Layer‑2 adjacency; the Cisco device becomes an ARP man‑in‑the‑middle between HA nodes.
Impact on NetScaler HA
HA communication failure – heartbeats and sync traffic are misrouted, dropped or delayed between nodes.
HA sync failure – configuration and state updates cannot be exchanged; show ha node reports Sync State: FAILED.
Failover disruption – during failover, the new primary sends GARPs, but the Cisco device continues replying with its own MAC, so upstream devices do not update MAC ownership correctly.
Split VIP ownership – some flows are sent to the new primary, others still to the old secondary; VIPs appear active on both nodes and traffic is split.
How to diagnose the problem
1. Check HA status on NetScaler
Look for:
Sync State: FAILED.
Node communication warnings or heartbeat issues.
2. Inspect ARP tables on NetScaler
Check whether:
The peer node IP or VIPs resolve to the MAC address of the Cisco switch or firewall instead of the other NetScaler’s MAC.
3. Inspect ARP on the Cisco device
Suspicious example:
NetScaler‑owned IPs should normally map to NetScaler MAC addresses, not to the switch.
4. (Optional) Packet capture
Capture ARP and GARP traffic during a planned failover.
Validate that:
NetScaler sends GARPs when roles change.
The upstream Cisco device responds or proxies ARP instead of letting the peer advertise its own MAC
Resolution: disable Proxy ARP
The fix is to disable Proxy ARP on the upstream network devices for all interfaces and VLANs that carry NetScaler management or data IPs.
Cisco example
On each relevant interface or SVI:
Apply this to:
Interfaces connected directly to NetScaler appliances.
VLAN interfaces that include NSIP, SNIP and VIP subnets.
After the change, clear ARP tables or wait for entries to time out so that correct MAC addresses are relearned.
Verification steps
1: Validate ARP learning On both NetScalers
The peer node IPs should now resolve to each other’s real MAC addresses, not the Cisco MAC.
2: Test HA failover Trigger a manual failover:
Confirm that:
All VIPs move to the new primary.
No traffic is still forwarded to the secondary node.
3: Confirm HA sync state
The expected state is Sync State: SUCCESS with stable heartbeats and no sync errors.
Best practices
To avoid this class of issue in NetScaler / Citrix ADC HA environments:
Do not enable Proxy ARP on interfaces or VLANs that carry load balancer IP ranges (NSIP, SNIP, VIP).
Ensure proper Layer‑2 adjacency between HA nodes; avoid unnecessary L3 devices between them.
Validate ARP tables on both NetScaler and upstream switches during implementation and after changes.
Test GARP propagation and failover behaviour in staging and after major network changes.
Conclusion
When Proxy ARP is enabled on upstream devices, it can override normal ARP resolution and silently break NetScaler HA.
The HA nodes lose direct visibility of each other, HA sync fails, and VIPs end up “split” across both appliances, causing unstable traffic and user impact.
Disabling Proxy ARP and restoring correct ARP behaviour between the NetScaler pair re‑establishes node‑to‑node communication, stabilizes HA synchronization and ensures that VIP ownership and failover work as designed.