About two years ago the term Hyper Converged infrastructure was not common in most discussions around that IT water cooler. Now it seems like this is all that people are talking about. Hyper Converged has skyrocketed onto the Gartner Magic quadrant and customers are buying it faster than vendors can build it. Let’s take a step back and have a look at this video from Microsoft Product Manager Cosmos Darwin. In less than 5 minutes he will explain what Hyper Converged Infrastructure is.
Ok so here is how I normally explain it to people. In traditional converged infrastructure, there are 3 common tiers:
- Network
- Compute
- Storage
Each are distinctly disjointed and in this is the model that we have trusted major vendors such as Dell/EMC, HPE, and IBM to build us for years. The problem was that it boxed out smaller OEM Vendors from the storage game. Honestly those big three ruled the modern data center’s for years. Now what has started happening is a trend of customers wanting to reduce the complexity of their infrastructure by collapsing two of the above layers. So if we take layer 2 and layer 3 (compute and storage) and collapse them we get Hyper Converged. That seems pretty simple right.
Hyper Converged has been ignited in the industry by the leader Nutanix who as recently gone public with their IPO in September. They have paved the way for so many other vendors that area looking to steal the Converged Storage & Compute infrastructure away from the household brands. For all of my friends that work at Nutanix thank you as you and your company has done a wonderful job as an early innovator and leader in this space. However, Nutanix had better stay on the A Game as there is a new series of contenders that has thrown their hat into the ring this fall. Most notably is software giant Microsoft, who has back ported much of their knowledge, investment, and intellectual property into their new flagship Operating System Windows Server 2016 with a feature called Storage Spaces Direct or S2D. This new operating system actually gives customers the ability to build their own Hyper Converged infrastructure for free. Not only have they given it away for free they have also introduced a hardware hardening process to ensure quality and consistent customer experiences.
For us in the MVP Community having Microsoft jump into the game has been incredible news as this now gives all of us a platform to work with that is completely supported 100 % by the vendor we love and trust. To that extent I have joined up with other Microsoft MVP’s and Future MVP’s to form a community alliance called Project Codename “Data_Raft”. Our vision is that everyone should have access to this Hyper Converged infrastructure and we will help the community by providing designs and architectural guidance. So far we have over 8 Microsoft MVP’s signed up and many more are extremely interested. The uptake of this project we have incubated has been so incredible that we already have a long lineup of customers that are scheduling calls with us to come in and talk to them about what Microsoft can do for them in this space.
The economics of free is making sense for a lot of customers, especially in light of the current economic conditions in my home province of Alberta. Gone are the days where the IT Department can just stroke a cheque for 1 million dollars for a new V-Block from VMware. Now with these new DIY hardened configurations from Microsoft you can likely come in at 1/10 of that cost and have faster storage, better compute, and a solution that is completely supported by Microsoft.
So once again thank you to all the Vendors that are participating in the Hyper Converged space. You are paving the way for some really interesting projects for myself and my team in 2017.
Now having some of my very best customers these solutions over the past two years there are some major drawbacks that often get left off the table. Now for the first time I am going to tell you about some of the secrets that the Hyper Converged vendors don’t want you to know. There are some massive drawbacks and at looking at Hyper Converged Solutions especially from non-brand name vendors. Heed my words and read the list below and take this information into your next meeting around hyper converged anything.
-
Talk to your vendor about their support organization. How many people do they actually have? Do they share an on-call phone?
- I have had horrific support experiences with smaller support organizations. Many of these startups begin with a handful of developers, a massive sales team, and less than 5 people in support.
- I have had horrific support experiences with smaller support organizations. Many of these startups begin with a handful of developers, a massive sales team, and less than 5 people in support.
-
If your solution is going to be running on top of Windows Server make sure you find out what is inside their image.
- In many organizations, we spend many long months if not years hardening our images. This process is known as a Standard Operating Environment or SOE. If you take a vendor’s appliance and slap it into your data center without checking you could be violating some of your critical compliance regulations. So be careful and do your homework.
- In many organizations, we spend many long months if not years hardening our images. This process is known as a Standard Operating Environment or SOE. If you take a vendor’s appliance and slap it into your data center without checking you could be violating some of your critical compliance regulations. So be careful and do your homework.
-
Do your homework on their hardware platform of choice. Is it tier 1 (HPE, Dell, Lenovo) or is it a white box solution?
- If it is a tier one solution you are likely going to have a more positive experience. I have found in implementing the white box solutions over the past 2 years that the issues are not actually with the hardware. It is with a lack of engineering experience with the vendor on that platform. Done correctly the white box server solution can be rock solid. Done incorrectly can result in certain failure for your project. I have seen more engineering issues with this type of hardware in the past 18 months than I have ever seen in the past 20 + years in my career. Now that being said I have also been white box solutions being widely successful. Such is the case with Nutanix and their white box platform. I didn’t actually ever experience hardware failures on similar white box hardware that they were providing. So back to my point make sure your vendor knows what they are doing with the hardware they are working with.
- If it is a tier one solution you are likely going to have a more positive experience. I have found in implementing the white box solutions over the past 2 years that the issues are not actually with the hardware. It is with a lack of engineering experience with the vendor on that platform. Done correctly the white box server solution can be rock solid. Done incorrectly can result in certain failure for your project. I have seen more engineering issues with this type of hardware in the past 18 months than I have ever seen in the past 20 + years in my career. Now that being said I have also been white box solutions being widely successful. Such is the case with Nutanix and their white box platform. I didn’t actually ever experience hardware failures on similar white box hardware that they were providing. So back to my point make sure your vendor knows what they are doing with the hardware they are working with.
-
Watch your protection levels.
- I find it very common place with hyper converged vendors to pitch their customers on space savings by using erasure coding. In plain English erasure coding is like RAID over the network. When a vendor tells you that you are protected by a 2+1 it means you are running a RAID 5 over the network. Which means you can lose one drive, all the drives, or the later in a whole node at once. If you two go down, you are dead in the water. Make sure that you do your homework upfront as it may be worth your while to look at a technology that does 2-way, 3-way mirroring, or expand your Erasure Coding protection levels to something like a 4+2. 4+2 is obviously going to cost more money because you need a minimum of 6 nodes to achieve what we would know as a RAID 6 type technology or survive 2 node failures.
- I find it very common place with hyper converged vendors to pitch their customers on space savings by using erasure coding. In plain English erasure coding is like RAID over the network. When a vendor tells you that you are protected by a 2+1 it means you are running a RAID 5 over the network. Which means you can lose one drive, all the drives, or the later in a whole node at once. If you two go down, you are dead in the water. Make sure that you do your homework upfront as it may be worth your while to look at a technology that does 2-way, 3-way mirroring, or expand your Erasure Coding protection levels to something like a 4+2. 4+2 is obviously going to cost more money because you need a minimum of 6 nodes to achieve what we would know as a RAID 6 type technology or survive 2 node failures.
-
You Operational Support costs increase with hyper converged infrastructure
- Now didn’t we purchase this new solution to save money? Well that is what you friendly vendors would like you to think. Hyper converged infrastructure actually increases support time and costs because you have combined your storage tier and your compute tier together. You can no longer simply live migrate your workloads to another compute host, patch the host that is in maintenance mode, move the VM’s back and move on. And by move on I mean quickly. Once you combine compute and storage you MUST wait for the storage to rebuild prior to commencing the next node. If you move too fast it is like you pulling two drives in your RAID 5 array prior to the rebuild completing. How well has that worked out for us in the past. From my experience, I have found that it takes significantly longer to patch the host servers in a hyper converged design. Make sure you budget in the extra man power required to look after this gear.
- Now didn’t we purchase this new solution to save money? Well that is what you friendly vendors would like you to think. Hyper converged infrastructure actually increases support time and costs because you have combined your storage tier and your compute tier together. You can no longer simply live migrate your workloads to another compute host, patch the host that is in maintenance mode, move the VM’s back and move on. And by move on I mean quickly. Once you combine compute and storage you MUST wait for the storage to rebuild prior to commencing the next node. If you move too fast it is like you pulling two drives in your RAID 5 array prior to the rebuild completing. How well has that worked out for us in the past. From my experience, I have found that it takes significantly longer to patch the host servers in a hyper converged design. Make sure you budget in the extra man power required to look after this gear.
-
How long do rebuilds of data take?
- Let’s say something happens and the power goes out. Before you can gain access to your data it needs to be rebuilt on the storage nodes. Assuming all of them came back online this can take hours before you are back online. I have also personally witnessed data corruption and production data loss in these events. The vendors will all tell you to have a good UPS in front of the hyper converged servers. That doesn’t protect us from a PDU (Power Distribution Unit) failure or UPS battery failure. You need to test this scenario with some live data prior to signing off on the project moving forward.
- Let’s say something happens and the power goes out. Before you can gain access to your data it needs to be rebuilt on the storage nodes. Assuming all of them came back online this can take hours before you are back online. I have also personally witnessed data corruption and production data loss in these events. The vendors will all tell you to have a good UPS in front of the hyper converged servers. That doesn’t protect us from a PDU (Power Distribution Unit) failure or UPS battery failure. You need to test this scenario with some live data prior to signing off on the project moving forward.
-
You need solid backups
- It almost feels like at times with the hyper converged solutions that are flying off the shelves that corners are being cut all over the place. I have had to recover MANY times from backups to get hyper converged infrastructure back up and running. This isn’t necessarily the fault of the hyper converged vendor although in most cases I experienced it was. Many times, it was due to a network configuration issue and flapping ports on the switch. Your storage relies on that network so guess what happens if it isn’t working
- It almost feels like at times with the hyper converged solutions that are flying off the shelves that corners are being cut all over the place. I have had to recover MANY times from backups to get hyper converged infrastructure back up and running. This isn’t necessarily the fault of the hyper converged vendor although in most cases I experienced it was. Many times, it was due to a network configuration issue and flapping ports on the switch. Your storage relies on that network so guess what happens if it isn’t working
-
Nobody gets fired for buying the name brand solution
- It Is very sad but true unfortunately. Be extremely careful when buying solutions from smaller vendors. Microsoft has provided a solution via Windows Server 2016 and their storage spaces direct that allows you to run a hyper converged solution on tier 1 hardware free of charge. Personally my preference moving forward is for customers to use Tier 1 hardware even if it costs a bit more. In the event a customer wants to go for the more cost effective solution make sure you do your research on that platform. One such platform that I have checked out is from DataON Storage. They have custom engineered their servers and solution specifically for Microsoft Storage Solutions. In fact, they are trusted by some extremely reputable Microsoft MVP’s myself included. You can visit their website at www.dataonstorage.com or can follow DataOn on twitter @DataOn.
Thanks and hope you enjoyed the read,
Dave