Storage Wars–HCI edition

Permalink til innebygd bilde

There is alot of fuzz these days around hyperconverged, software defined storage etc.. especially since VMware announced VSAN 6.2 earlier this week, that trigge alot of good old brawling on social media. Since VMware was clearly stating that they are the marked leader in the HCI marked, if that is true or not I don’t know. So therefore I decided to write this post, just to clear up the confusion on what HCI actually is and what the different vendors are delivering in terms of features and how their  architecture differentiates. Just hopefully someone is as confused as I was in the beginning..

Now after a while now I’ve been working for this quite some time now, so in this post I have decided to focus on 4 different vendors in terms of features and what their architecture looks like.

  • VMware
  • Nutanix
  • Microsoft

PS: Things changes, features get updated, if something is wrong or missing let me know!

The term hyper-converged actually comes from the term converged infrastructure, where vendors started to provide a pre-configured bundle of software and hardware into a single chassis. This was to try minimize compability issues that we would have within the traditional way we did infrastructure, and of course make it easy to setup a new fabric.  So within hyperconverged we integrate these components even further so that they cannot be broken down into seperate components. So by using software-defined storage it allows us to deliver high-performance, highly available storage capacity to our infrastructure without the need of particular/special hardware. So instead of having the traditional three-tier architecture, which was the common case in the converged systems. We have servers where we combine the compute and storage, then we have software on the top which aggreagates the storage between mulitple nodes to create a cluster.

So in conclusion of this part, you cannot get hyperconverged without using some sort of software-defined storage solution.

Now back to the vendors. We have of course Microsoft and VMware which are still doing a tug o war with their relases, but their software-defined storage option has one thing in common. It is in the kernel. Now VMware was the first of the two to release a fully hyperconverged solution and as of today they released version 6.2 which added alot of new features. On the other hand Microsoft is playing this safe, and with Windows Server 2016 they are releasing a new version of Storage Spaces which now has an hyperconverged deployment option. Now belive it or not, Microsoft has had alot of success with the Storage Spaces feature, since it has been a pretty cheap setup and included that with some large needed improvements to the SMB protocol. So therefore let us focus on how VSAN 6.2 and Windows Server 2016 Storage Spaces Direct which both have “in-kernel” ways of delivery HCI.

VMware VSAN 6.2

Deployment types: Hybrid (Using SSD and spinning disks) or All-flash
Protocol support: Uses its own proprietary protocol within the cluster
License required: License either Hybrid or All-Flash
Supported workloads: Virtual machine storage
Hypervisor support: ESXi
Minimum nodes in a cluster: 2 (With a third node as witness)https://blogs.vmware.com/virtualblocks/2015/09/11/vmware-virtual-san-robo-edition/)
Hardware supported: VSAN Ready Nodes, EVO:RAIL and Build Your Own based on the HCL –> http://www.vmware.com/resources/compatibility/search.php?deviceCategory=vsan
Disk requirements: Atleast one SSD and one HDD
Deduplication support: Yes, starting from 6.2 near-line (only within an all flash array only
Compression support: Yes, starting from 6.2 near-line (only within an all flash array only)
Resilliency factor: Resiliency,  Fault Tolerance Method (FTM) Raid-1 Mirroring. Raid 5/6 are in the 6.2 release
Disk scrubbing: Yes, as of 6.2 release.
Storage QoS: Yes, as of 6.2 release. (Based upon a 32KB block size ratio) can be attached to virtual machines or datastores.
Read Cache: 0,4% of Host memory is used for read cache, where the VMs are located.
Data Locality: Sort of, it does not do client-side local read cache.
Network infrastructure needed: 1Gb or 10Gb ethernet network. (10GB only for all-flash) multicast enabled
Maximum number of nodes: 64 nodes pr cluster

Things that are important to remember is that VMware VSAN stores data within an object. So for instance if we are to create a virtual machine on a Virtual SAN datastore, VSAN would create an object for each virtual disk, snapshot and so on. It also creates a container object that stores all the metadata files of the virtual machine. So the availability factor can be configured pr object.  These objects are stored on one or multiple magnetic disks and hosts, and VSAN can access these objects remotely both read and write wise. VSAN does not have the concept of a pure data locality model like others do, a machine can be running on one host but the objects be stored on another, this gives a consistent performance if we for instance were to migrate a virtual machine from one host to another. VSAN has the ability to read for multiple mirror copies at the same time to distribute the IO equally.

Also VSAN has the concept of stripe width, since in many cases we may need to stripe and object across multiple disks. the largest component size in VSAN is 255 GB, so if we have an VMDK which is 1 TB, VSAN needs to stripe that VDMK file out to 4 components. The maximum strip width is 12. The SSD within VSAN act as an read cache and for a write buffer.

image

Windows Server 2016 Storage Spaces Direct*

*Still only in Tech Preview 4

Deployment types: Hybrid (Using SSD and spnning disks) or All-flash
Protocol support: SMB 3
License required: Windows Server 2016 Datacenter
Supported workloads: Virtual machine storage, SQL database, General purpose fileserver support
Hypervisor support: Hyper-V
Hardware supported: Storage Spaces HCL (Not published yet for Windows Server 2016)
Deduplication support: Yes but still only limited support workloads (VDI etc)
Compression support: No
Minimum nodes in a cluster: 2 *And using some form a witness to maintain quorom)
Resilliency factor: Two way mirror, three way mirror and Dual Parity
Disk scrubbing: Yes, part of chkdisk
Storage QoS: Yes, can be attached to virtual machines or shares
Read Cache: CSV Read cache (Which is part of the RAM on the host) also depending on deployment type. In Hybrid mode, SSD is READ & WRITE cache, therefore SSD is not used for persistent storage.
Data Locality: No
Network infrastructure needed: RDMA enabled network adapters, including iWARP and RoCE
Maximum number of nodes: 12 nodes pr cluster as of TP4
You can read more about under the hood about Storage Spaces Direct here –> http://blogs.technet.com/b/clausjor/archive/2015/11/19/storage-spaces-direct-under-the-hood-with-the-software-storage-bus.aspx
Hardware info: http://blogs.technet.com/b/clausjor/archive/2015/11/23/hardware-options-for-evaluating-storage-spaces-direct-in-technical-preview-4.aspx

Important thing to remember here is that we have an CSV volume which is created on top of a SMB file share. Using Storage Spaces Direct, Microsoft leverages mulitple features of the SMB 3 protocol using SMB Direct and SMB Multichannel. Another thing to think about is that since there is no form for data locality here, Microsoft is dependant on using RDMA based technology to with low-overhead read and write data from another host in the network. With has much less overhead then TCP based networks. Unlike VMware, Microsoft uses extents to spread data across nodes these are by default on 1 GB each.

image

Now in terms of difference between these two, well first of its the way the manage reads and writes of their objects. VMware has a distributed read cache, while on the other hand Microsoft requires RDMA but allows to read/write with very low overhead and latency from different hosts. Microsoft does not have any virtual machine policies that define how resillient the virtual machine is, but this is placed on the share (which is virtual disk) which defines what type of redundacy level it is. Now there are still things that are still not documentet on the Storage Spaces Direct solution.

So let us take a closer look at Nutanix.

Nutanix

Deployment types: Hybrid (Using SSD and spinning disks) or All-flash
Protocol support: SMB 3, NFS, iSCSI
Editions: http://www.nutanix.com/products/software-editions/
Supported workloads: Virtual machine storage, general purpose file service* (Tech Preview n
Hypervisor support: ESX, Hyper-V, Acropolis (Cent OS KVM custom build)
Hardware supported: Nutanix uses Supermicro general purpose hardware for their own models, but they have an OEM deal with Dell and Lenovo
Deduplication support: Yes, both Inline and post clusted based
Compression support: Yes both Inline and post process
Resilliency factor: RF2, RF3 and Erasure-Coding
Storage QoS: No, equal share
Read Cache: Unified Cache (Consists of RAM and SSD from the CVM)
Data Locality: Yes, read and writes are aimed at running on the local host which the compute resources is running on.
Network infrastructure needed: 1Gb or 10Gb ethernet network. (10GB only for all-flash)
Maximum number of nodes: ?? (Not sure if there are any max numbers here.

The objects on Nutanix are broken down to vDisks, which are composed of multiple extents.

Source: http://nutanixbible.com/

Unlike Microsoft, Nutanix operates with an extent size of 1MB, and the IO path is in most cases locally on the physical host.

image

When a virtual machine running on a virtualization platform does and write operations it will write to a part of the SSD on the physical machine called the OpLog (Depending on the resilliency factor, OpLog will then replicate the data to other node Oplog to achive the replication factor that is defined in the cluster. Reads are served from the Unified Cache which consists of RAM and SSD from the Controller VM which it runs on. If the data is not available on the cache it can get it from the extent store, or from another node in the cluster.

Source: http://nutanixbible.com/

Now all three vendors all have different ways to achive this. In case of Vmware and Microsoft which both have their solution in-kernel, Microsoft focused on using RDMA based technology to allow for low latency, high bandwidth backbone (which might work in the advantage, when doing alot of reads from other hosts in the network (when the traffic is becoming in-balanced)

Now VMware and Nutanix on the otherhand only require regular ethernet 1/10 GB network. Now Nutanix uses data locality and with new hardware becoming faster and faster that might work in their advantage since the internal data buses on a host can then generate more & more troughput, the issues that might occur which VMware published in their VSAN whitepaper on why they didn’t create VSAN with data locality in mind is when doing alot of vMotion which would then require alot of data to be moved between the different hosts to maintain data locality again.

So what is the right call? don’t know, but boy these are fun times to be in IT!

NB: Thanks to Christian Mohn for some clarity on VSAN! (vNinja.net)

#microsoft, #nutanix, #storage-spaces-direct, #vmware, #vsan

Windows Azure Stack–What about the infrastructure Story?

There is no denying that Microsoft Azure is a success story, from being the lame silverlight portal with limited capabilities that it was to become a global force to be reckoned with in the cloud marketspace.

Later today Microsoft is releasing their first tech preview of their Azure Stack. Which allow us to bring the power of Azure platform to our pwn datacenters. It brings the same consistent UI and feature set of Azure resource manager which allows us to use the same tools and resource we have used in Azure against our own local cloud.

This of course will allow large customers and hosting providers to deliver Azure platform from their own datacenter. The idea seems pretty good thou. But what is actually Azure Stack ? It only deliver half of the promise of a Cloud like infrastructure. So I would place Azure stack within the category of cloud management platform. Since it is giving us the framework and portal experience

Now when we eventually have this setup and configured, we are given some of the benefits of the cloud which are

  • Automation
  • Self-Service
  • A common framework and platform to work with

Now if we look at the picture above there are some important things we need to think about in terms of fitting within the cloud aspect which is the computer fabric / network fabric and storage fabric which is missing from the Microsoft story. Of course Microsoft is a software company, but they are moving forward with their CPS solution with Dell and moving a bit towards the hardware space, but no where close yet.

When I think about Azure I also think about the resources which are beneath, they are always available, non-silo based and can scale up and down as I need to. Now if we think about the way that Microsoft has built their own datacenters there are no SAN archietecture at all, just a bunch of single machines with local storage with using software as the way to connect all this storage and compute into a large pool of resources, which is the way it should be since the SAN architecture just cannot fit into a full cloud solution. This is also the way it should be for an on-premises solution. If we were to deploy a Azure CloudStack to deliver the benefits of a cloud solution, the infrastructure should reflect that. As of right now Microsoft cannot give a good enough storage/compute solution with Storage Spaces in 2012 R2 since there are limits to the scale, and points of failure which a public cloud does not have.

Now Nutanix are one of the few providers which deliver support for Hyper-V and SMB 3.0 and does not have any scale limits and have the same properties as a public cloud solution. It agreegates all storage on local drives within each node into a pool of storage and with redundancy in all layers including an REST API which can easily integrate into Azure Stack, I can easily see that as the best way to deliver an on-premises cloud solution and a killer-combination.

#azure, #azure-stack, #hci, #nutanix, #windows-server-2016

New award, Nutanix Technology Champion!

Today Nutanix announced their list of Nutanix Technology Champions for 2016, and I am honored to be among the people on the list. Nutanix is doing alot of cool things, and alot more to come Smilefjes

http://next.nutanix.com/t5/Nutanix-Connect-Blog/Welcome-to-the-2016-Nutanix-Technology-Champions/ba-p/6382?utm_content=buffer05396&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

I am delighted to announce the 2016 group of Nutanix Technology Champions. This year has seen enormous demand from the community to participate in this program, and on behalf of the entire community team here at Nutanix, we are grateful and honored in the volume and the quality of the feedback we’ve had.
The Nutanix Technology Champion program spans the globe and is comprised of IT professionals from every cloud, application group, and technology. They are committed to news ways of thinking that will power the next generation of enterprise computing.
I am looking forward to getting to know you all and will be contacting our new NTC members shortly with more details. Congratulate one another and if you are sharing on social, please do use #NutanixNTC so other’s can engage in the conversation. Thank you for believing in us and the larger community.

#nutanix

Setting up NFS Direct Veeam against Nutanix cluster

So the last couple of days I have tried to wrap my head around Direct NFS support which is coming in Veeam v9. The cool thing about this feature is that Veeam has a custom built NFS agent, which will go directly to the NFS share (only needs READ access) and export the snapshot data when doing a backup.

Now important that Veeam is configured against a vCenter server ( I tried many times against an ESX directly and then NFS Direct didn’t really work.

When setting up a Direct NFS backup solution, we need to first setup a Veeam Backup Proxy as we would in other scenarioes. We need to include the Veeam Backup Proxy in the virtual vSwitch that Nutanix provisions within ESX (Note: Do not change the vSwitch, just add the VM to the vSwitch network)

image

Then define an IP address to the Veeam Backup Proxy within the vSwitch so it can communicate with the Controller VM.

image

Note that since the vSwitch is an internal only switch, we should setup a Backup proxy per node to maximize the performance. Even thou in this scenario it will work to do NFS direct on this node against other node as well, but then we will be pushing the traffic across the Controller VM network. So when setting up backup jobs try to make it so it uses the local proxy on the host which the virtual machines recides on, this will give the best troughput.

We also need to whitelist the IP address of the proxy so that it can allow access ot the NFS share (Which in the case of Nutanix will be the Storage Container which virtual machines resides on) This can be done on a container level or at a cluster level.

image

Next we need to “force” Veeam to use the storage network on the proxies to do backup traffic. Which can be done in the central management pane within Veeam.

image

Lastly we need to rescan the storage attached to the infrastructure which will allow Veeam to see the new NFS datastores and see that they can access it using NFS direct. This can be done here.

image

We can see from the statistics of this job that it is using NFS in the first screenshot

image

We can also see in the backup job log file for the VM

image

and that we are using regular hotadd in the second one.

image

#nutanix, #veeam

Enabling file-level restore on Nutanix in NOS 4.5.1

Ever since I heard that this feature was included in 4.5 I was eager to give it a spin, but since it is still in TP It was difficult to find any particular documentation regarding the feature, the only mention was in the release notes and some blog posts I found. Now since it was TP I was guessing that the feature would not appear in PRISM so I had to dig into the CLI.

I noticed that there was an command for FLR under the virtualmachine list-flr-snapshots, when I ran the command

ncli> virtualmachine list-flr-snapshots vm-id=00051d07-74fe-2635-0000-00000000698a::5035e717-b404-916d-72d4-a8750120c633
Error: Nutanix Guest Tools are not enabled for this VM.

So again, where can I find the Guest Tools to do this ? in the CLI Smilefjes

ncli> nutanix-guest-tools enable vm-id=00051d07-74fe-2635-0000-00000000698a::5035e717-b404-916d-72d4-a8750120c633

    VM Id                     : 00051d07-74fe-2635-0000-00000000698a::5035e717-b404-916d-72d4-a8750120c633
    Nutanix Guest Tools En… : true
    File Level Restore        : false

I saw that the file level restore option was disabled so I needed to enable it for a particular machine, which was in a protection domain.

ncli> nutanix-guest-tools enable-applications vm-id=00051d07-74fe-2635-0000-00000000698a::5035e717-b404-916d-72d4-a8750120c633 application-names=»File Level Restore»

    VM Id                     : 00051d07-74fe-2635-0000-00000000698a::5035e717-b404-916d-72d4-a8750120c633
    Nutanix Guest Tools En… : true
    File Level Restore        : true

Then I needed to mount the guest tools to the VM

ncli> nutanix-guest-tools mount vm-id=00051d07-74fe-2635-0000-00000000698a::5035e717-b404-916d-72d4-a8750120c633                               Successfully mounted Nutanix Guest Tools.

This in essence will mount an ISO under the CD/DVD rom. My first mistake

image

After installing Java I could continue on with the configuraition. Now in the Nutanix Guest Tools CLI mode I can now look and mount my snapshots.

Using the commands

flr ls-snaps (to list out snapshots

flr attach-disk disk-label=labelname snapshot-id=idname

image

Then I can do a regular file explorer to my orginal content as it was during the time of the snapshot.

#nutanix

Pin-to-SSD Nutanix EOS 4.5

So earlier today I was looking at the Pin to SSD video from Andre Leibovici , shown here –> http://myvirtualcloud.net/?p=7334 and figured I wanted to give this feature a spin..

But no matter where I looked I didn’t find the same feature within Prism, luckily thanks to the twitter gods I got in touch with the Nutanix bible author himself.

image

So off I went exploring into the CLI, and found that under ncli virtual-disk there was an option called update-pinning. Which has three required attributes

image

id, tier-name and pinned-space.

In order to get the virtual disk ID we want to pin to ssd we need to use the virtual-disk list command. To get the name of the different tiers we can use the list tier command

So when we got what we need we can use the command

virtual-disk update-pinning id=idofthevdisk tier-name=idofthessdtier pinned-space=amountofGBtopintossd

image

If we now run virtual-disk list we can see that it has pinned space, but note that this is not visiable in PRISM.

#nutanix

Getting Started With Nutanix and PowerShell

Now that I have my hands on some Nutanix hardware it was about time to play a little bit with the features that are available on the platform. All of the stuff we do in PRISM is linked to the REST API, Nutanix also has a PowerShell cmdlets which also leverages the REST API.

Downloading the Nutanix cmdlets can be done from within PRISM

In order to connect to a cluster use the follwing command line

NOTE: for security reasons we should store our passwords as a secure string, by declaring these as variables before starting PowerShell.

$user = «your prism user»

$password = read-host «Please enter the prism user password:» -AsSecureString

connect-ntnxcluster -server ip -username -password password –acceptinvalidcert (only if you are using the self-signed certificate)

After we have connected we can use other commands such as

get-ntnxclsuter

image

Using the command get-command -module NutanixCmdletsPSSNapin will list out all cmdlets available in the snapin. Now most of the cmdlets have the same requirements in form of input as the REST API http://prismdevkit.com/nutanix-rest-api-explorer/ 

But not all cmdlets are properly documented, so during the course of the week I found out that there was one line of code that was crucial.

Get-ntnxalert | resolve-ntnxalert

image

And also for instance if someone has read my blogpost on setting up Nutanix monitoring using Operations Manager we can also use PowerShell to setup the SNMP config using these simple commands

add-ntnxsnmptransport –protocol “udp” –port “161” | add-ntnxsnmpuser –username username –authtype SHA –authkey password –privtype AES –privkey password

BTW: Here is a reference poster for all PowerShell cmdlets for Nutanix http://go.nutanix.com/rs/031-GVQ-112/images/Powershell_Automation_Poster.pdf

#nutanix, #powershell

Setting up Operations Manager for Nutanix

Nutanix has a management pack available for several monitoring solutions such as Solarwinds, Nagios and of course Operations Manager, which allows us to monitor hardware / DSF / hypervisor directly into Operations Manager. Now combining this with the service management capabilities that Operations Manager has is a killer combo. Now the setup is pretty simple when run from a management server

image

image

After the management packs are installed, new monitoring panes should appear within the console.

image

Now the management pack uses a combination of SNMP and the REST API, first of we can configure the SNMP properties for the management pack, which can be done in PRISM under settings // SNMP

image

From there we need to enable SNMP, set up a v3 user profile which OpsMgr will use to authenticate and encrypt traffic.

image

And lastly define transport rule which is UDP on port 161.

image

Next we can run a discovery wizard from within Operations Manager to search for the CVM machines.

image

Next we need to add each device and create a specific SNMP user that we can use to contact the Nutanix CVM.

image

image

image

Eventually that devices will appear under the discovered devices pane, which means that we can contact the devices using SNMP.

image

If we now head back to the monitoring pane we can see that the devices appear healthy.

image

Next is to add monitoring to the cluster. This uses the REST API to communicate with the cluster IP.

image

image

Now we should add both a PRISM account and an IPMI account (Note that I have excluded the IPMI part since I had some minor issues with the IPMI connection on my nodes at the time)

image

Eventually the nodes will appear in the monitoring pane we can extract out performance information from the cluster as well.

image

If we go into health explorer of a CVM we can see all the different monitoring objects it checks.

image

Note: If you upgrade NOS you might/should need to rerun the cluster monitoring wizard again.

#nutanix, #operations-manager, #scom-and-nutanix

New job! Systems Engineer at Exclusive Networks (BigTec)

So I have been on a job hunt for some time now, and I’m quite picky on what job to take because of a lot personal stuff happening which has put alot of strain on me, and that moving two hours away from Oslo to the middle of nowhere in Norway makes thing much more difficult to do from a job perspective.

Even thou, I have now started at Exclusive Networks (BigTec) as a System Engineer..

So what will I be doing there? (Firstly BigTec which is the area I will be focusing on) is a part of Exclusive Networks which is a value add distributor focusing on datacenter change.

Well from a techincal perspective I will be focusing on the different vendors which are part of the BigTec portfolio. Such as Nutanix, vArmour, VMTurbo, SilverPeak and Arista.

nutantix-vendor-logovarmour-vendor-logovmturbo_416x416-300x300SilverPeak-New-Logoarista

So this is not my regular milk and butter… Since I have been focusing on Microsoft related technology for like forever, but for my part It will be a good thing to expand my horizon to new products and other aspects of IT, (and this is most likely going to affect my blogpost forward as well, you have been warned!) and moving more towards pure datacenter releated technologies and security as well.

If you want to know more about what we are doing, head on over to our website http://bit.ly/1PtizYx

#arista, #bigtec, #nutanix, #silver-peak, #varmour, #vmturbo, #vmware

Comparison Microsoft Storage Spaces Direct and Nutanix

There has been a lot of fuzz around storage spaces direct coming with Windows Server 2016, and I have been getting alot of questions around it lately. “Will it solve my storage issues?” “can we replace our existing SAN?” “When to choose SPD over SAN” and so on.

Now as of right now, not all the technical details are known around the feature itself and not all features are 100% in place but this blogpost will do a comparison between Nutanix and Storage Spaces Direct and how they differenciate. Now Storage Spaces direct is a more advances Storage Spaces setup, but uses the same capabilities but now we can agreegate local disks inside servers to setup a SMB 3.0 based fileservice.

This is an overview of how a Storage Spaces Direct setup might look like, since it has a requirement for 4 nodes and having a backbone RDMA, Im come back to why this is a requirement. Now as I have mentioned previously is that Storage Spaces direct has an issue and that is with data locality, Microsoft treats storage and compute as two seperate entities and that is reflected in the Storage Spaces Direct setup. Since it can be setup as two seperate components SMB Scale out file server or using hyperconvereged.

When setting up as Hyperconverged the following happens

image

Let us say that we have a VM01 running on NODE1 which is running on top of a storage spaces direct vdisk01 running as a two-way mirror. What will happen is that Storage Spaces will create 1 GB extent of the vDisk and spread the chucks across seperate nodes, so even thou the VM01 is running on a specific host, the storage is placed random on the different hosts within the cluster, which will indicate that this will generate alot of east-west traffic within the cluster, and that is why Microsoft has set a requirement that we have RDMA network backbone on our Storage Spaces Direct cluster since it will require low-latency – high troughput traffic in order to be efficient in this type of cluster setup, since Microsoft just looks at the different nodes as a bunch of disks.

On the other hand, Nutanix solves this in another matter, which I also think that Microsoft should think about which is data locality, in case of a VM running on a particular host, most of the content is served locally from the host that the VM is running on, using the different tieres (Content Cache), (Extent Store), (Oplog)

image

Which removes the requirement of any particular high speed backbone.

#nutanix, #storage-spaces-direct