Ecosystem Technology Partner (ETP) Program: Logging

The Docker “Ecosystem Technology Partner (ETP) Program” is designed to highlight partners in the Docker ecosystem that have demonstrated quality integrations with the Docker platform and offer a compelling user experience.

The first set of partners to demonstrate their expertise in recording and managing log data for Dockerized applications includes Amazon CloudWatch from Amazon Web Services (AWS), Elastic, Graylog, Rapid7/Logentries, Loggly, Papertrail, Sematext Logsene, Sumo Logic and Treasure Data.

Logging is the process by which an application emits a stream of descriptive events about system behavior which are primarily useful for troubleshooting or analysis. Having accessible log data at the right level of abstraction is critical to performing root cause analysis and understanding system health in general.

On a modern Linux system, there is a daemon for processing log stream sources and writing them to disk or forwarding them on to other systems. Traditionally, the Syslog protocol has been the dominant log format, which defines a couple of fields to describe the source and severity of the event as well as a generic message field. Specific implementations such as rsyslog, syslog-ng, and more recently systemd-journald have expanded on the basic functionality, while remaining compatible with the original protocol. These log management services and the applications that feed them may require complex configuration to wire them together and customize routing.

Microservices and Docker

Application architectures are moving towards a microservices approach. From a logging standpoint, this implies an increasing number of log stream sources with potentially unpredictable lifecycles. For example, an application may consist of dozens of Docker containers running across a cluster of systems, and multiple copies of this application (or individual components within it) may be scaled independently of each other. An ideal log management solution would need to tame this chaos by capturing log data from all containers in a predictable way and empower engineers to quickly sift through the noise to get at the information they need.

The twelve-factor methodology, which defines best practices for microservices-based architectures and has been influential in the Docker community, suggests that log data from applications should be captured from the standard output stream only. The standard output (and error) streams which pre-date Syslog and are a fundamental concept in *nix systems support streams of arbitrary log message data. Despite Syslog being a proven and standard way to capture log data, it adds limitations such as message payload size, unnecessary complexity, and coupling between the application and its environment.

Microservice-based architectures are designed to be managed at a higher level of abstraction than the traditional host-based approach, so any unnecessary coupling between the application and the host system can get in the way. Like other design decisions for the Docker Engine that define a clear contract between the application and its environment, the decision to support stdout/stderr streams for log data is both highly portable and flexible.

NOTE: Not all applications support logging to stdout/stderr streams. There are numerous workarounds to choose from for these edge cases within a Docker environment, but they should only be used as a last resort due to the extra complexity introduced.

Logging in the Docker Ecosystem

The Docker Engine supports various methods of obtaining log data. At the container level there are Docker API endpoints to obtain log data in bulk or stream it in real-time, optionally over websocket. Prior to the implementation of logging drivers, real-time log collection from the Docker Engine required external tools, which would maintain per container connections to the Docker API based on container lifecycle events. This methodology still works well and is the basis for some of the partner implementations highlighted today, but introduces a bit of overhead and is no longer necessary.

Starting with the 1.6 release in April 2015, logging drivers have abstracted away the complexity required in previous solutions by embedding log forwarding facilities directly into the Docker Engine. Upon daemon startup (for global defaults) or individual container creation, configuration can be specified for how log data should be routed. Users can choose to store log data in the default JSON-based format on disk, forward it to various ingestion systems, or discard it entirely. It is also possible to pass along additional container metadata, depending on what options the particular logging driver supports. At the time of writing, the following logging drivers have been added since being introduced in 1.6.0:

Docker Engine Release New Logging Drivers
1.6.0 json-file, syslog, none
1.7.0 journald
1.8.0 fluentd, gelf
1.9.0 awslogs

For more information on logging drivers and options, visit the docs.

None

The none logging driver is useful in a few scenarios: when you are already pushing log event data through a third-party client library embedded in your application code, when the application is not capable of producing log data through stdout/stderr, when the application utilizes an interactive terminal, or when you are simply not interested in the log data.

Syslog and Journald

In recent Linux distribution releases, Systemd has replaced a number of daemon service implementations. Logging is no different, hence the journald logging driver has been added to forward Docker logging data to the systemd-journald daemon. While systemd-journald supports the Syslog protocol, it introduces a new, more flexible protocol for log ingestion that the driver interfaces with. Since Linux distributions have a number of logging sources outside of Docker containers such as the kernel or system services, using the syslog or journald logging driver are both good options when a single unified host-based approach is desired.

Fluentd

Fluentd is a highly configurable open source data collector/forwarder. It employs a plugin architecture which accommodates hundreds of input sources and output sinks with advanced filtering and routing capabilities. With the explosion in tools and services for working with large data sets, utilizing a flexible routing system such as Fluentd via the fluentd logging driver, can greatly simplify the work needed to aggregate data from many disparate sources. Treasure Data, the company behind the Fluentd project, provides a managed analytics service for working with large data sets, not specific to logging.

GELF

Graylog Extended Log Format (GELF) is a modern logging protocol which pairs nicely with the open source log management and analysis software Graylog Server. While Graylog Server supports hundreds of input sources and output sinks through a plugin architecture, it also embeds an elastic search implementation, exposes a REST API, and provides a convenient web interface for analysis. By utilizing the gelf logging driver in combination with a Graylog Server, Docker users can quickly stand up an advanced log management and analysis pipeline with a minimum of effort. Graylog, the company behind GELF and Graylog Server, offers commercial support.

AWS Logs

If you are deploying Docker on Amazon Web Services (AWS), Amazon CloudWatch provides you a logging service that can monitor other AWS services and send alerts based on custom conditions. By utilizing the awslogs logging driver, your Docker deployments on AWS can closely integrate with CloudWatch to provide a unified view of your virtual application infrastructure.

Choosing the right logging solution for you

There are many compelling Docker log management solutions to pick from. In the default case, log data from container stdout/stderr will be persisted to disk, and can also be streamed in real-time via the Docker API. By utilizing one of the built-in logging drivers, it is possible to efficiently forward log data to local system services or external systems over the network. There are also a number of flexible log collection/forwarding/routing tools which can act as an intermediary buffer for log data emitted by the Docker Engine and its final destination. We’ve only covered a few here that are related to the current logging driver implementations, but a quick look through the open source projects produced by companies operating large scale streaming data systems will yield a dizzying array of tools that are applicable to log management.

The partner solutions linked to at the top of this post were selected because they provide an excellent log management and analysis experience for Docker users. In some cases, a “docker run” command or handful of daemon configuration options are all that is required to get started.

Get Started with Docker and Logging today

• Read Docker’s press release on the ETP for Logging
Visit to docs for the latest resources on Docker logging
Register for a Docker webinar
• Find a Docker Meetup group near you
• Register for upcoming Docker Online Meetups
• Start contributing to Docker

Learn more about our ETP logging partners and their solutions

Read the Sematext Logsene Blog for more information and check out their demo on “Log Management for Docker”:

 
Read more about the integration between Loggly and Docker on the Loggly blog:
dockerdaemon-image3-loggly

 
Learn more about the Logentries Docker Insights Dashboard on the Rapid7/Logentries blog:
logentries_docker_slides

 
Papertrail’s Log Management Solution:
papertrail-docker-1

Sumo Logic’s Log Management:
Screenshot_Sumologic

 


Learn More about Docker

, , , , , , , , , , , ,


Ecosystem Technology Partner (ETP) Program: Logging


Leave a Reply

Get the Latest Docker News by Email

Docker Weekly is a newsletter with the latest content on Docker and the agenda for the upcoming weeks.