Netscaler with mulitple packet engines and WHY you should size properly

For those working with Netscaler, I often stumple across those that don’t size packet engines properly on Netscaler VPXs.

By default, when deploying a Netscaler VPX is comes 2 vCPU and 2 GB memory. Of those 2 vCPU one is used for management purposes and the second vCPU is used for packet flow. It handles load balancing, compression, content switching and so on. (CPU 0 is the management core) 

So how can we can the utilization of these CPUs ? (and no we cannot use regular unix tools like top and so on, they will not display it properly since the packet engine core is always looking for work to do it will be reported as utilized even thou there isn’t any work for it, that’s why we need to use stat system)

We can use the commands stat cpu and stat system
In a regular VPX we can only see one packet engine CPU because of the two vCPUs.

Now for a regular VPX 1000 we can have a maximum of of 3 packet engines, meaning a total of 4 vCPU (also meaning that we need to add more memory to the VM) you can see the chart from Citrix here –> http://support.citrix.com/article/CTX139485

So let’s do a quick comparison if these changes improve our performance. The first here is displayed on a VPX 1000 with 2 vCPU and 2 GB memory. The second is further down in VPX 1000 with 4 vCPU and 8 GB memory.

(NOTE: Multiple packet engines are not available on Hyper-V, only Vmware and Xen) and note that this is CPU dependant as well, the better the CPU the better SSL performance)

Now in order to test this I used a benchmarking tool from apache called ab (stands for apachebench)
It creates multiple requests against a virtual load balanced vServer. It a regular HTTPS vServer which the benchmark is going run against. Since this test is going against a regular HTTPS traffic .

ab -n 50000 -c 1000 http://192.168.10.32/index.html (This will do a benchmarking test using HTTP GET) with 50000 requests with 1000 concurrent requests against a web address

Now notice this is the first run (The packet engine CPU is over 90%) a bit more packets here and my Netscaler would be unable to process the packages.

When I ran the same test against 4 vCPU (Where 3 are PE) I get a more load distributed result (Here I just used the stat cpu command to see load on each individual PE)

So remember, scale PE accordingly! if you are unsure if you need to scale out take a look at your current enviroment with stat CPU during the busiest part of the day.