Arnaud Porterie

Open Source at Docker, Part 3: The Tooling and Automation

The Docker open source project is among the most successful in recent history by every possible metric: number of contributors, GitHub stars, commit frequency, … Managing an open source project at that scale and preserving a healthy community doesn’t come without challenges.

This post is the last of a 3-part series on how we deal with those challenges on the Docker Engine project. Part 1 was all about the people behind the project, and part 2 focused on the processes. In Part 3, we will cover tooling and automation.

There are many areas for automation in a project such as Docker. We wanted to present and share some of our tooling with you: the CI, the utility bots, and the project dashboards.

Continuous integration

Every project needs good CI, it is as simple as that: CI is your safety net when moving fast. Docker supports a vast spectrum of platforms, as well as a few very special “orthogonal” features that require a test suite of their own (in this case, user namespaces comes to mind).

Today on the Docker Engine, a single PR may require up to 10 different kind of jobs which are all orchestrated by Jenkins. There are many CI services out there that are easy to setup and free for open source projects, but proper of Docker testing requires extensive control of the host, sometimes down to kernel tuning. For these reasons, we host our own testing infrastructure which spreads across a variety of hosts and cloud providers (our Windows CI runs for example on Azure, and our ARM CI on Scaleway).

mayagenda

Not all of these jobs are automatically triggered on every pull requests: maintainers use their best judgement on which additional jobs are relevant, and rely on Gordon through IRC to interact with the CI server.

mayagenda

Every project needs a turtle

Besides lurking on IRC to help maintainers with the CI, Gordon the Turtle does a few more things to automate the boring tasks, and save the maintainers some precious effort. For example, Gordon is known to help new contributors who forgot to sign their commits following our contributor guide:

mayagenda

You’d be surprised how many contributors thank Gordon for her help.

Over time, Gordon has also learned to recognize some typical pull requests and automatically label them for us:

mayagenda

You may also have noticed that Gordon labels issues according to the Docker version they apply to. This is particularly useful for maintainers to easily filter all issues for a particular release, and potentially close what applies to deprecated versions.

mayagenda

Currently, Gordon only takes care of those very basic tasks, but there’s a lot more we want to teach Gordon in the future: merging when tests are green and enough maintainers gave an LGTM, closing pull requests after a long period of inactivity, etc.

Metrics, metrics, metrics

A lot of the processes I described in the part 2 of this blog series involve measurements of some sort: how are the most active people on the repository, how fast are we to merge pull requests, how many pull requests did we process this week, etc. As much as we love GitHub and use it in our daily workflow, we need customizable visibility into those things.

We built a tool called vossibility (very original contraction of “OSS visibility”) to assist us with that. Simply put, it captures every single event happening on any of Docker’s open source repositories, augments it with extra information, transforms the data to make it easier to consume, and stores it in ElasticSearch. The result of all this is the ability to use the wonderful Kibana to build all the dashboards we need.

 

Community members

I mentioned in an earlier post that becoming a maintainer requires regular activity over an extended period of time in the open source repository. We measure this not by number of pull requests created nor commits (which would give us a measure of how active a contributor is), but rather by the number of unique issues and pull requested one has interacted with. In this context, what counts as an interaction can be a comment on a commit, on an issue, or on a pull request, as well as the creation of an issue, or of a pull request. This gives us a sense of who is actively participating in the community, and captures broader interest than her or his contributions alone.

mayagenda

Besides revealing things we already know (for example that Gordon just cannot keep us with Sebastian), this is the ideal way to discover active members of the community who aren’t yet maintainers but maybe should, as well as maintainers who moved on to other things.

 

Repository activity

Another use of vossibility is to give us an immediate overview into recent history. For example, this is a dashboard showing the last 30 days of activity on the github.com/docker/docker repository alone:

 

 

This shows us tons of useful data including:

  • Our average, 90th percentile, and 100th percentile of number of days to process a pull request. For the curious: it takes us on average less than 6 days to process a pull request, and less than 22 days for 90% of all cases.
  • The number of pull requests we processed over this period of time.
  • The top most commented items, which is surprisingly useful for surfacing older issues that are suddenly getting more attention.
  • The geometry of our community (the origin of pull requests between Docker employees, external maintainers, and the broader contributors).
  • The notable pull requests that got merged, as indicated by bearing the `impact/changelog` label.

 

A visual history of the project

The project has changed a lot in its 3 years of existence, and we can for example see over time how the number of repositories multiplied, and how the relative “weight” of them has evolved.

 

 

There a ton of other uses of the dashboard that I could go over, from the maintainers who say LGTM the most, to those who most often close without merging (aka “The Dream Killers”). Furthermore, the data is also consumed by other tools, for example docker-bulletin where Gordon commits a weekly report of activity every week.

 

That’s all folks!

This concludes this blog series on how we do open source at Docker. The adventure continues at DockerCon 2016, where the maintainers and myself will be happy to answer all your questions in the sessions of the “Contribute and Collaborate” track, and maybe help you get your very first contribution in!


 

Learn More about Docker

, ,

Arnaud Porterie

Open Source at Docker, Part 3: The Tooling and Automation


Leave a Reply

Get the Latest Docker News by Email

Docker Weekly is a newsletter with the latest content on Docker and the agenda for the upcoming weeks.