Michael Crosby

Dolly Demo at LinuxCon: Rapid cloning of existing services with runC

Michael Crosby

At LinuxCon in August, I presented a keynote demo with Diogo Mónica and Marianna Tessel. The goal of the demo was to show that checkpoint and restore of containers can not only be used for the migration of stateful services i.e. stopping the service and moving it, but also for rapid cloning of existing services. Cloning existing services quickly is one way to get an application to scale as demand increases.

The most important problem we had to tackle was cache warming. A cache takes a long time to warm and can hinder the ability to scale out an application horizontally. By cloning an existing cache we do not have to pay the cost to fill it, allowing us to quickly clone our services across multiple containers. You can check out the code on github.



The dolly application had a very standard setup for modern web applications. There was a stateless web frontend that would pull messages out of a local cache on each server and return the message on each request, along with some additional metadata about the frontend that served the request. We used redis as the local cache in the demo, but memcached would have worked too.

Whenever our application is deployed to a new server it will take the frontend a few minutes to warm the cache from the backend. Due to the high load of filling the cache for the frontend and a few carefully crafted sleep()’s in the code you will see slower response times until the cache is full.

In order to avoid the cost of a slow startup, slow request time, and adding more load to the backend databases as the cache is warming we can use containers with checkpoint/restore via CRIU to checkpoint the redis container(our cache). We then migrate it using rsync, scp, tar, or your choice of file movement tools, then restore the container on the new host.

To achieve this we used runC with criu to run and checkpoint the redis container, then rsync to move the container’s memory, open fds, open connections, and all other runtime information to the new host. We then used runC to restore the container from this persisted state back to a running container.


Steps to replicate

run everything as root and in the same dir!

1. Download a redis image for runC: http://crosbymichael.com/packages/redis-demo.tar.gz

2. Make sure server is setup via: https://github.com/crosbymichael/dolly/blob/master/server.sh

3. Extract the container to a directory as root.

4. Cd into the directory.

5. The first time, run the container as root: type ./runc

6. In another terminal,checkpoint the container: run: ./runc checkpoint There is now a ./checkpoint directory in your cwd. You can ls to see what type of things are persisted during a checkpoint. This is the directory that you can move across servers with the live process information and memory.

7. In another terminal>restore the container type: ./runc restore

8. Done!


Additional tips

You need similar systems to migrate across.  Same system specs, cpu instruction sets, etc… Most modern data centers like digitalocean have like hardware and CPUs on their VMs and allow easy migration in a single data center or across data centers.

Deploy the container root filesystem to all the servers before migration. That way you are not copying the entire container’s root filesystem every time, only the process’s memory. This makes migrations super fast to transfer.

runC creates and runs the container and provides the interface for checkpoint/restore and criu does the heavy lifting of persisting and restoring the live process state for the container.

General good software practices go a long way. When designing applications that can work with things being migrated, make sure that they have the ability to reconnect if a connection drops. Using connection pools are a good idea and if the connection to the cache is interrupted then reconnect instead of a hard application failure. Most applications can handle being checkpointed and restored without them knowing what happened but clients of these services should be able to reconnect on network failures.

To get started, check out the code in the github repo.

runC was donated by Docker to the Open Container Initiative, and is an open source project. To learn more about it, check out the github repo. The OCI encourages people to contribute to all its projects, including runC, so please do help build this great tool. Pull requests are always welcome.



 Learn More about Docker



One thought on “Dolly Demo at LinuxCon: Rapid cloning of existing services with runC

Leave a Reply