There are people who will tell you that the community has made up its mind when it comes to container orchestration.
The reality could not be further from the truth. A recent survey, of over 500 respondents, addressing questions about DevOps, microservices and the public cloud revealed a three way orchestration race between Docker Swarm, Google Kubernetes, and Amazon EC2 Container Service (ECS).
When you think about which orchestration tool is right for your environment, we believe the following three key things must be considered:
- Performance: How fast can I get containers up and running at scale? How responsive is the system when under load?
- Simplicity: What’s the learning curve to set up and ongoing burden to maintain? How many moving parts are there?
- Flexibility: Does it integrate with my current environment and workflows? Will my applications seamlessly move from dev to test to production? Will I be locked into a specific platform?
Docker Swarm leads in all three areas.
Performance at Scale
We released the first beta of Swarm just over year ago, and since then we’ve made remarkable progress. In less than a year, we introduced Swarm 1.0 (November 2015) and made clear that Swarm can scale to support 1,000 nodes running in a production environment, and our internal testing proves that.
Kubernetes previously released their own blog detailing performance testing on a 100 node cluster. The problem for customers is that there was no way to really compare the results between these two efforts as the test methodologies were fundamentally different
In order to accurately assess performance across orchestration tools there needs to be a unified framework.
To that end Docker engaged Jeff Nickoloff, an independent technology consultant, to help create this framework, to make it available to the larger container community for use in their own evaluations.
Today Jeff released the results of his independent study comparing the performance of Docker Swarm to Google Kubernetes at scale. The study and article, commissioned by Docker, tested the performance of both platforms while running 30,000 containers across 1,000 node clusters.
The tests were designed to measure two things:
- Container startup time: How quickly can a new container actually be brought online versus simply scheduling it to start.
- System responsiveness under load: How quickly does the system respond to operational requests under load (in this case listing all the running containers)
The test harness looks at both of these measurements as the cluster is built. A fully loaded cluster is 1,000 nodes running 30,000 containers (30 containers per node).
As nodes are added to the cluster, the harness will stop and measure container startup time, and system responsiveness. These breakpoints happened when the cluster was 10%, 50%, 90%, 99%, and 100% full. At each of these load levels 1,000 test iterations are executed.
What this means is that, for instance, when the cluster is 10% full (100 nodes, and 3,000 containers), the harness will pause adding new nodes. It will instead measure the time it takes to startup a new container (in this case the 3,001st container), and how long it takes to list all the running containers (3,001). It does this particular sequence 1,000 times. The 3,001st container is created, the startup and list times are measured, and the container is removed 1,000 times.
The results show that Swarm is on average 5X faster in terms of container startup time and 7X faster in delivering operational insights necessary to run a cluster at scale in production.
Looking more closely at the results for container startup time, there is a clear performance advantage for Swarm regardless of cluster load level.
From Jeff’s blog:
Half the time Swarm will start a container in less than .5 seconds as long as the cluster is not more than 90% full. Kubernetes will start a container in over 2 seconds half of the time if the cluster is 50% full or more.
One important thing to note is that this test isn’t about container scheduling, it’s about getting containers running and doing work.
The reality is nobody cares if a container was “scheduled” to run, what they care about is that the container is actually running. I think about it like this: If I go out to eat, taking my order and handing it off to the kitchen is great, but what’s really important is how long it takes to actually get my meal prepared and delivered to my table.
One of the promises of containers is agility and responsiveness. A 5X delay in container startup time absolutely wreaks havoc on distributed applications that need near real-time responsiveness. Even in cases where real-time responsiveness isn’t needed, taking all that extra time to bring up infrastructure is painful – think about using orchestration as part of a continuous integration workflow, longer container startup times directly correspond to longer test cycle times.
It’s one thing to scale a cluster to 30,000 containers, and it’s a completely different thing to be able to be able to efficiently manage that environment. System responsiveness under load is critical to effective management. In a world where containers may only live for a few minutes, having a significant delay in gathering real-time insight into the state of the environment means you never really know what’s happening in your infrastructure at any particular moment in time.
In order to gauge system responsiveness under load, the test harness measured the time it took to list out all the running containers at various levels of cluster load.
The result: Compared to Swarm, Kubernetes took up to 7x longer to list all the running containers as the cluster approached full load – taking over 2 minutes to list out the running containers. Furthermore, Kubernetes had a 98X increase in response time (that’s not a typo it’s 98X not 98%) as the cluster went from 10% to 100% full.
So why exactly is Kubernetes so much slower and less responsive than Swarm? It really comes down to system architecture. A quick glance at the diagrams from Jeff’s testing environments shows that Swarm has fewer moving parts than Kubernetes.
All of these components introduce a high degree of complexity to the setup process, inject latency in executing commands and makes troubleshooting and remediation difficult.. The diagram below depicts the number of component level interactions in Kubernetes compared to Swarm. The 8X more “hops” to complete a command like
list add latency and result in a 7X slower system for critical orchestration functions. Another impact of these many interactions is that when a command fails to complete, it is difficult to deduce at which point the failure occurred.
Kubernetes was born out of Google’s internal Borg project, so people assume it’s designed to perform well at “cloud scale”. The test results are one proof point that Kubernetes is fairly divergent from Borg. However, it does share one thing in common with Borg: being overly complex and needing teams of cloud engineers to implement and manage it day to day.
Swarm, on the other hand, shares in a core Docker discipline of democratizing complex cloud technologies. Swarm has been built from day one with the intent of being the best way to orchestrate containers for organizations of all sizes without requiring an army of engineers. With an easy to use experience that is the same whether you are testing a small cluster on your laptop, setting up some test servers in a datacenter or your production cloud infrastructure.
As Jeff said, “Docker Swarm is quantitatively easier to adopt and support than Kubernetes clustering components.”
Some might argue that Kubernetes is more complicated because it does more. But “doing more” does not bring any value to the table if the “more” isn’t anything you care about. And, in reality, it can actually end up being a detriment as “more” can introduce additional points of failure, increased support costs, and unnecessary infrastructure investments.
Or as Jeff describes it:
“…Kubernetes is a larger project, with more moving parts, more facets to learn, and more opportunities for failure. Even though the architecture implemented by Kubernetes can prevent a few known weaknesses in the Swarm architecture it creates opportunities for more esoteric problems and nuances.”
As I stated at the outset of this post, performance and simplicity are only two factors when considering an orchestration tool. The third critical element is flexibility and flexibility itself means many things.
The previously mentioned survey results show that there are three main orchestration tools companies are using or considering include: Docker Swarm, Google Kubernetes, and Amazon EC2 Container Service (ECS).
Of those three, only Docker is fully committed to ensure that your application runs unfettered across the full gamut of infrastructure: From your developers to your test environment, to a production deployment on the platform of your choosing. On a laptop, in your private datacenter, or on the cloud provider of your choosing. Docker Swarm allows you to cluster hosts and orchestrate containers anywhere.
Beyond offering true portability of your workloads across public and private infrastructure, Docker features a plugin based architecture. These plugins ensure that your Dockerized applications will work with your existing technology investments across networking, storage, and compute and can be moved to a different network or storage provider without any change to your application code.
In the end a compelling orchestration tool is a necessary part of any Container as a Service (CaaS) environment. The reality is that orchestration is not the platform but only one piece of a much larger technology stack.
We know this because the same survey previously mentioned also tells us that users want tools that address the full application lifecycle, feature integrated tooling for both their developers and operations engineers, as well supporting the widest range of developer tools.
Join us for an online meetup on Thursday March 10th at 9am PT featuring Jeff Nickoloff to learn more about Swarm and scale testing.
- Read Jeff Nickoloff’s article
- Access the test harness here to try it yourself
- View all the test result data here
- Share the news that Docker Swarm exceeds Kubernetes performance at scale!
Don’t forget to participate in our DockerCon ticket raffle! Share a picture or description of your Swarm with us on Twitter and tag @docker and #SwarmWeek for a chance to win a free ticket to DockerCon 2016 in Seattle, June 20-21.
Here are some more Docker Swarm resources:
- Get started by downloading Docker Swarm and reading the docs
- Try Docker Swarm as part of Docker Datacenter
- Submit questions to Docker Forums or file issues in Github
- Contribute to the Docker Swarm project
Learn More about Docker
- New to Docker? Try our 10 min online tutorial
- Share images, automate builds, and more with a free Docker Hub account
- Read the Docker 1.10 Release Notes
- Subscribe to Docker Weekly
- Sign up for upcoming Docker Online Meetups
- Attend upcoming Docker Meetups
- Register for DockerCon 2016
- Watch DockerCon EU 2015 videos
- Start contributing to Docker