Vivek Saraswat

High Availability Architecture and Apps with Docker Datacenter (DDC)

High availability (HA) isn’t just about keeping the lights on all the time; it’s also about quickly turning them back on when they unexpectedly go out. With software, this means capabilities for fault tolerance as well as backup and recovery. Docker Datacenter (DDC) provides this for both the container-based applications as well as the application infrastructure components (such as cluster management, orchestration, account settings, etc.). In this post we will look at how high availability is achieved in the latest release of Docker Datacenter.

As a refresher, Docker Datacenter is comprised of the following software:

  • Universal Control Plane (UCP) with Swarm for cluster orchestration and management
  • Docker Trusted Registry (DTR) for secure image collaboration and distribution
  • Docker Engine with commercial support to run your containerized apps

Setting up HA on your DDC Deployment

Architecturally, let’s start with how HA is achieved for your DDC infrastructure. It all begins with setting Universal Control Plane. In UCP, HA is achieved by setting up multiple UCP controllers, each of which runs on its own host. UCP makes use of a distributed key value store to ensure that each controller is updated with the latest information on the cluster.

As of UCP 1.1, each controller is identical to the original, ensuring that if any controller goes down, the cluster is still running and thus configuration and state are preserved. Previously, replicas of the UCP controller  did not replicate the Certificate Authorities (CA’s); the cluster was still preserved if the primary controller failed, but certain actions related to adding nodes or generating user bundles were limited until the primary controller was brought back up. These limitations no longer occur as the CA’s are also replicated in the controllers.

In general, a cluster with N controllers can tolerate (N-1)/2 failures. In the diagram below, 3 UCP controllers allows for one of them to go down while still preserving the cluster. Losing an additional controller will cause the key-value store to lose quorum, thus breaking the cluster. Adding more controllers (up to 7) will increase fault-tolerance, but requires more hosts and can slow down the cluster. After installing the controllers, you can add in additional UCP nodes for user applications as needed for your deployment.

 

Now that UCP is installed, we can start installing Docker Trusted Registry. DTR 2.0 has a fully redesigned cluster architecture that makes use of replicas for high availability. These replicas contain a distributed key value store and replicated database, and talk to each other over an overlay network. Similarly to UCP, installing N DTR replicas allows the system to tolerate (N-1)/2 failures.

 

A couple of tips for setup:

  • DTR 2.0 uses UCP for orchestration, authentication, and monitoring, and thus UCP must be installed first in order to bring up DTR.
  • It is strongly recommended to put the UCP controllers and DTR replicas on separate nodes. This is to ensure that a failure in one solution does not affect the other.
  • It is also generally recommended in large-scale production deployments to keep UCP nodes on separate hosts from the controllers/replicas, in order to ensure that application issues do not affect the integrity of the UCP cluster. This may not be necessary in test environments or small-scale deployments.

 

Backup and Restore

Now that you have your architecture set up, you need the ability to save the state of your cluster–data such as user accounts and configuration settings–in order to recover from a failure. This is where UCP’s new backup/restore CLI commands come in. Using the UCP CLI tool you can take backups of a UCP controller, which saves the state of the cluster in a .tar file. Let’s say you have UCP HA deployment consisting of Controllers A, B, and C. In case of a failure, here’s you would recover:

  • Make sure you have previously taken backups of any one of the controllers. In this case, let’s say we have been taking regular backups of Controller A.
  • Use the UCP CLI tool “stop” command to stop all UCP system containers on the controllers you have not backed up–in this case, on Controllers B and C.
  • Run the UCP CLI tool “restore” command on Controller A.
  • Run the UCP CLI tool “uninstall” command on Controllers B and C, then rerun the “join –replica” command on these controllers.
  • You should now have a restored cluster!

This functionality is best used for recovering from catastrophic host failures with previous backups. However in some scenarios it may be possible to take a backup after controller failure, particularly in a case where the cluster is broken due to loss of quorum from the controllers.

 

High Availability for Applications

So far we’ve been talking about UCP and DTR architectural HA. But what about the actual app containers? What do you do if a UCP node running some of your app containers fails? This is where container rescheduling comes in. This feature was experimental in previous versions of Swarm but is now generally available in Swarm v1.2  (which is used by UCP 1.1). With container rescheduling, you can set a label or environment variable to a container that tells Swarm to reschedule a container (i.e. restart on a different node) if the node it is currently on ever goes down.

For more on how to do this, read the Swarm container rescheduling documentation or watch the demo below:

We hope you’ve found this post useful for how you can use Docker Datacenter to provide high availability for both your application infrastructure and containers. Give it a spin via the links below or feel free to ask any questions on the forums.

 

Additional Resources on Docker Datacenter


 

Learn More about Docker

, , , , , ,

Vivek Saraswat

High Availability Architecture and Apps with Docker Datacenter (DDC)


6 Responses to “High Availability Architecture and Apps with Docker Datacenter (DDC)”

  1. Yogesh

    Vivek thanks for insight. I am running non K8 and have docker registry in back for hosting containers.

    Do we have product /PS for getting HA for docker registry,

    Reply
  2. Vivek Saraswat

    Vivek Saraswat

    Hi Yogesh, HA as described above is for the commercially supported Docker Trusted Registry. It is not available for Docker Registry at this time.

    Reply
  3. Enric

    Hi Vivek, quick question: does the new version of Docker Engine (12) that integrates Swarm work with Docker Datacenter UCP? Will UCP change the way of working when we go to Docker Engine 1.12?
    Many thanks!

    Reply
  4. Vivek Saraswat

    Vivek Saraswat

    HI Enric,

    The current production build of UCP (1.1.X) runs on top of Commercially Supported Engine 1.11 and will continue to do so for the immediate future, for the purposes of stability and support. When CS Engine 1.12 is released (typically this happens a few weeks after the open-source release) then we should support UCP 1.1.X running on top of that version. However, UCP 1.1.X will still run using the "classic" Swarm with separate swarm-master/swarm-join containers, NOT with the integrated swarm-mode.

    At DockerCon we demoed a preview (in-development) build of UCP running on Engine 1.12 with the new built-in Swarm mode. This won't fundamentally change how you use UCP, but it will add new features like the "docker service" command.

    Check out this blog post for more info: https://blog.docker.com/2016/07/docker-datacenter-dockercon-2016-image-security-engine-1-12-and-burning-man/

    Reply
  5. Matt

    Hi

    Is there a step-by-step guide for installing datacentre anywhere?

    Reply
  6. Jatin

    Hi Vivek,

    Is DOCKER a replament or rival of VERITAS ApplicationHA ?
    In other words, can Docker perform application level HA integrated with VMWare HA ?
    So when an application fails at vmware VM, it will failover that application to another host VM.

    Reply

Leave a Reply

Get the Latest Docker News by Email

Docker Weekly is a newsletter with the latest content on Docker and the agenda for the upcoming weeks.