Netscaler and real performance tuning

So yesterday I held a session at Citrix User Group in norway regarding Netscaler and performance tuning, not so much I can really say about performance tuning in 45 minutes but I think I managed alright.

The agenda on my list was

* TCP profiles, Multipath TCP, Path MTU
* SSL profiles and tuning
* Autonegotitation and duplex
* Netscaler VPX
* Jumbo frames and LACP
* Last but not least mobilestream

Now most of this is core Netscaler optimization features, expect Mobilestream which is more related to features standing behind Netscaler. So therefore I wanted to write a blogpost about it as well.

Firstly is the TCP profiles. By default there is an TCP profile which hasen’t changed since 1999. So the Netscaler profile is by default there for compability and not for the best performance, but of course there are alot of different factors invovled here. For instance what kind of network infrastructure you have, packet loss, bandwidth, jitter, firewalls and so on.

But, the main thing is that the default profile does not:

Have Window Scaling activated (Window scaling is usefull send more packets inse the scaling window meaning that we can easier send more data)

Have Selective Acknoledgement activated (Means that we don’t need to resend all the data after a packet loss. Meaning that if we sendt packets 1, 2, 3, 4 , 5 and the sender didn’t receive packet 3 we don’t need to resend 4, 5)

Have Nagle alogrithm activated (Gathers up more data and waits until it reaches the full MTU and then sends the data)

So for instance the ICA-protocol which is very chatty and uses small packets (Which uses alot of overhead) means that it is not suiteable for the regular TCP-profile, so this is where the tcp profile

nstcp_xa_xd_profile (Which has all the features I mentioned above enabled in the policy) but of course you also have the mobile users who are jumping back and forth between different WLAN points or mobile antennas which means there is a point with total packet loss. In the default TCP profile it uses TCP reno, which tries to cut the congestion window in half when it detected a packet loss, not going to do the mobile users any good Smilefjes

Therefore Citrix impletented a variant of the TCP congestion features called Westwood+ which tries to determine the current bandwidth with the device and then it cuts the congestion window to reflect the current bandwidth. Which means that the mobile users can faster get to higher speeds again.

Now also with 10.5 ( I belive) is the option to enable MTCP (Multipatch TCP) so meaning that if you have mobile devices which support two atennas (one for mobile data and one for WIFI which can be used at the same time) we can have two TCP connections from the same device used to access content on the netscaler, its just a policy setting and we are good to go.

The problem is that you need to have specific applications written to leverage MTCP (Not all are there yet)

So go into System –> Profiles –> TCP Profiles (you can either use an existing one or create a new one)

image

Check for Window Scaling

image

And here for MTCP (If you need it) SACK and for Nagle.
Now there is also an downfall for Nagle since it waits until it waits until a full MTU has been reached before it sends it across the wire and the mobile user has a lot of packet loss, in theory there might be alot of data that needs to be resent across the wire. So for SQL instances for instance, don’t use Nagle! Smilefjes 

and the cool part is that these policies can be applied on each vServer and of course services, so dependant on the services it is hosting you can create a differnet policy.

The other thing is SSL tuning, there is a few tips here as well. First thing is quantum size. Bu default the quantum size is 8 KB meaning that the Netscaler will get 8 KB of data that is going to be sent across the wire and the sent it to the SSL chips for encrypting. We can also chance this quantum size to 16 KB meaning that more data is allowed inside the encrypted package.

image

So for solutions exposing for instnace downloading of large files, a 16 KB quantum size is to prefer. Regular websites which has alot of small data I recommend sticking to the 8 KB.

And then there is of course the autonegititation and duplex, which is something that everybody expects to work fine these days, but…

I still see some having issues with this and specific network devices, so you should always try to manually set the speed and duplex on the netscaler and the switch/router/firewall it is connected to.

For the VPX alot of tuning tips are the same as the MPX but….

For instnace the VPX has support for multiple packet engines meaning that you have a specific engine inside the Netscaler which runs all the different policies, handles encryption and so on. So for a regular VPX it is by default setup with 2 vCPU (One CPU for mangement and another for the packet engine) So if you have an VPX 3000 (2 vCPU and 2 GB ram might not be enough) so if you are using XenServer og Vmware you have the option to add more CPU and RAM to gain additional packet engines. (NOTE: Hyper-v does not support this feature and is capped at 2 vCPU and 2 GB ram and 2vNIC DON’T add 3 vNic)

But of course if you are running Hyper-V and Netscaler VPX make sure you have the newest drivers and make sure that VMQ (Virtual Machine Queing)

VMQ means that a VM has a dedicated Queue on the physical network card if VMQ is not working the VM has to use the default queue along with all the other VMs, with alot of Broadcom drivers that VMQ does not work.

And there is also LACP (NIC teaming, Port Channel, 802.3ad) which allows for aggreating and failover/redundacy on physical NICs (Note that this requires configuration on the switche/s and the Netscaler and it only works on the MPX and the SDX.

There is also a new feature which came with 10.5 is the suppor for Jumbo frames, this allows us to send up to 9000 MTU in an ethernet frame (the default 1500 MTU) which allows for much less overhead since there is more data in a single frame that requires less ACKs)

image

This only works on MPX/SDX as well, since a VPX is reliant on what the hypervisor provides.
This can be configured on per interface. But note that this requires support for jumbo frames on the switch / server, but note that this does not work out over the WAN since it stops at the router or the ISP (This they mostly support the default MTU)

But note the Netscaler also has the Path MTU feature (Which allows) to Netscaler to see the path ahead and see what the lowest minimum MTU is. This feature uses ICMP to determine what the lowest MTU is on a next-hop device. Problem is that since it uses ICMP the next hop devices might be firewalls and such and therefore it might not work. This feature is used to avoid IP fragmentation on the network.

That’s it for now, stay tuned for more Netsacler Smilefjes

#netscaler, #performance-tuning