Windows Server Failover Clustering with shared storage deployment options on AWS

One of the very frequently asked questions by the customers during on-premise to AWS cloud migration is ‘Can I deploy a WSFC with shared storage on AWS?’. Before I walk you through the different deployment options that are available on AWS, I’ll take a step back and address the question you may have in your mind ‘why bother having a WSFC with shared storage?’. Well, if you are dealing with an enterprise customer with plethora of windows server based COTS applications, you would very well know that Windows Server Failover Cluster with shared storage is the most commonly used option for achieving application HA. In most cases, disk witness for quorum majority, application specific data volumes and database data volumes will be shared between nodes that are part of a cluster. These block volumes are presented to cluster nodes by iSCSI targets which are typically SAN or NAS machines. The primary objective of having a failover cluster is to provide HA for applications, and shared resources can move between cluster nodes, hence most customers have the tendency to keep the setup as-is even on the cloud platform especially if the migration methodology is Lift and Shift.

Although you can achieve HA for legacy/COTS/bespoke applications on the cloud by leveraging cloud native features in conjunction with a little bit of re-coding/re-architecting the application, customers don’t prefer this path because of various legitimate reasons. If you are forced to deploy a WSFC solution on AWS, and if you are wondering how to provide shared storage for cluster nodes, then this article would definitely be handy. But wait, why all this fuss? Why can’t I use EFS? Well, Using Amazon EFS with Microsoft Windows Amazon EC2 instances is not supported. Yes, you read it right! But, it is not the end of world, here are some of the options available:

AWS Storage Gateway: Cached Volumes

In the cached volume mode, your data is stored in Amazon S3 and a cache of the frequently accessed data is maintained locally by the gateway. With this mode, you can achieve cost savings on primary storage, and minimize the need to scale your storage on-premises, while retaining low-latency access to your most used data.

Volumes created through AWS Storage Gateway can be presented through iSCSI targets. In my case, I created a 10 GB volume on storage gateway and presented it to one of the cluster nodes. The cluster node was able to connect to the iSCSI target via iSCSI initiator and the presented block volume appeared in the disk management console as an unallocated storage. Other nodes in the cluster can also connect to the iSCSI target via iSCSI initiator, and this volume can be moved between cluster nodes.

Gateway Instance:

Cached Volumes:

iSCSI Initiator on the Cluster Node:

Shared Storage in Disk Management Console:

The storage gateway instance can become a single point of failure in this deployment model, hence the storage gateway instance has to be architected to be highly available which I’ll cover in a different article.

SoftNAS

SoftNAS extends native AWS storage (EBS, S3) to create an enterprise-class, full-featured cloud NAS filer. You can add EBS volumes to SoftNAS as disk devices and LUNs can be created out of the EBS volumes. In my case, I added a 50 GB EBS volume to SoftNAS and created LUNs of different sizes.

10 GB LUN in SoftNAS:

LUNs can be presented as block volumes to Windows cluster nodes through SoftNAS iSCSI LUN Targets feature. Windows cluster nodes can connect to iSCSI target via iSCSI initiator.

SoftNAS instance can become a single point of failure, and you can leverage the inbuilt HA feature — SnapReplicate to deploy a highly available SoftNAS solution on AWS.

Apart from these two deployment models, you can also leverage Windows iSCSI target feature to share block volumes between cluster nodes. In this case, enable windows iSCSI target server feature on a common machine and start presenting block volumes to the cluster nodes.

Windows server iSCSI Target Server Feature:

Cluster nodes can connect to iSCSI target server via iSCSI initiator to access the shared block volumes. The iSCSI target server can become a single point of failure in this deployment model, hence the solution has to be architected carefully to make it highly available. We have deployed this solution for one of our customers using a combination of AMI snapshots, Lambda, Ansible and auto-scaling to achieve HA. I’ve also seen customers leveraging tools like SIOS Datakeeper to implement WSFC on AWS.

Once Windows compatible EFS becomes available, all these options will go out the window if your primary objective is to have a shared storage for Windows cluster nodes.Until then, these deployment options can be extremely handy!

An AWS and Google Cloud certified Cloud Professional leading a team of 60+ world class engineers to deliver business value to enterprises across ASEAN