This article is meant as an way to troubleshoot network issues on a NetScaler appliance, and of course ways to troubleshoot may differ, if you have any comments on what you typically do in this type of scenario please post a comment below!
So the other day I was tasked to troubleshoot a NetScaler issue, where the customer had someissues with ICA sessions going slow and unreliable. A big problem was the file transfers were not working at all, where the bandwidth usage was going between 0KBps – 200 KBps. So when doing an initial assesment I noticed the following
- Running NetScaler VPX 50
- Running on VMware
- Using LACP on the vDS on VMware
- Firewall between the NetScaler and the external users, where they were using NAT for incoming requests
First a couple things worth checking if ICA sessions are going slow
- Amount of SSL transactions (Depending on the CPU performance and compute resources available to the NetScaler, it is going to affect the performance on the appliance) If this is pretty high, it could be that the resources available to the NetScaler was just saturaged.
- Bandwidth use (Was it consuming to much resources so it couldn’t actually handle the amount of users trying to access this solution?)
- Packet CPU usage (On NetScaler the packet CPU’s are responsinble for all the packet handling, and it also has one dedicated vCPU for management) on a VPX 50 you can only have 2vCPU (1 for management and 1 for packet management)
So I noticed that the VPX had plenty of resources, the amount of SSL transactions were low (This could also be why they customer has issues with unreliable connections) the Packet CPU usage was low (I could see this by using stat cpu in CLI)
Then after we noticed that there was nothing wrong with the VPX, we took a closer look at the virtual infrastructure. I checked if the NetScaler VMware host was sagurated, of if there was any performance issues on the virtuel network that the NetScaler was placed on.
Since the issue was persistent and that it affected both client drive transfers and plain ICA proxy sessions, we guessed that this was issues with the external traffic and not the internal traffic which was causing the issue. We also checked that there were no bandwidth policies set on the XenApp farm which might affected the file transfer.
Now since the bandwidth performance of the NetScaler was going up and down, I was thinking that this might be congestion somewhere. So the simplest way was to do a trace file from the NetScaler to see what kind of traffic is going back and forth and if there were any issues.
After using WireShark for a while you get used to search for the most common parameters. If you have congestion somewhere you might get alot of RST or retransmits because of a full buffer. If you think about it, file transfer using client drive mapping will try to use as much bandwidth as possible. Another thing that was done before I did my test was to change the TCP profile to use nstcp_xa_xd_tcp_profile, which enables use of features like SACK and Nagle to reduce the amount of TCP segments and need for ACK messages in case of packet drops.
NOTE: A good tip when doing starting trace files from NetScaler for SSL connections is to enable for “Decrypt SSL packets”
From the trace file we noticed a couple of things.
1: Alot of retransmissions from the XenApp server to the NetScaler SNIP
2: TCP ZeroWindow
Which are two symptoms which are often connected.
This meant that the NetScaler was not able to receive further information at the time, and the TCP transmission is halted until it can process the information in its receive buffer. So what I immediately assumed that the TCP buffer size was adjusted or somewhat altered. This was not the case since it was still using the default size.
So why was this happening?
A quick google search indicated that this was an issue in the NetScaler build, which has since then been resolved in the later build –> http://support.citrix.com/article/CTX205656
So some quick tips when troubleshooting an NetScaler VPX
- Check if the appliance has enough compute resources
- Check if the hypervisor / virtualization layer it is running on has enough resources, or if it is a problem affecting other parts of the virtual network as well
- Draw a topology map of the network and elimiate other possible components in the network path
- Check TCP settings, remember that ICA-proxy is using TCP to the end-users
- Check a Trace file, use filters in WireShark to easily filter out traffic (https://wiki.wireshark.org/DisplayFilters) (Even thou you can set filters in the NetScaler it can consume more resources on the NetScaler and you might not see the whole picture from a networking perspective, for instance the NetScaler might be flooded with Network traffic from another IP source which will then not be displayed in the trace file)
- Last but not least, check if there are any known bugs in the current build and that the build is supported for the hypervisor that is being used. (http://support.citrix.com/en/products/netscaler)
NOTE: You can read more about TCP Window Scaling in this article –> http://support.citrix.com/article/CTX113656