Centralized logging for Vert.x applications using the ELK stack

This post entry de­scribes a so­lu­tion to achieve cen­tral­ized log­ging of Vert.x ap­pli­ca­tions using the ELK stack, a set of tools in­clud­ing Logstash, Elas­tic­search, and Kibana that are well known to work to­gether seam­lessly.

Preamble

This post was writ­ten in con­text of the project ti­tled “De­vOps tool­ing for Vert.x ap­pli­ca­tions”, one of the Vert.x projects tak­ing place dur­ing the 2016 edi­tion of Google Sum­mer of Code, a pro­gram that aims to bring to­gether stu­dents with open source or­ga­ni­za­tions, in order to help them to gain ex­po­sure to soft­ware de­vel­op­ment prac­tices and real-​world chal­lenges.

Introduction

Cen­tral­ized log­ging is an im­por­tant topic while build­ing a Mi­croser­vices ar­chi­tec­ture and it is a step for­ward to adopt­ing the De­vOps cul­ture. Hav­ing an over­all so­lu­tion par­ti­tioned into a set of ser­vices dis­trib­uted across the In­ter­net can rep­re­sent a chal­lenge when try­ing to mon­i­tor the log out­put of each of them, hence, a tool that helps to ac­com­plish this re­sults very help­ful.

Overview

As shown in the di­a­gram below, the gen­eral cen­tral­ized log­ging so­lu­tion com­prises two main el­e­ments: the ap­pli­ca­tion server, which runs our Vert.x ap­pli­ca­tion; and a sep­a­rate server, host­ing the ELK stack. Both el­e­ments are linked by File­beat, a highly con­fig­urable tool ca­pa­ble of ship­ping our ap­pli­ca­tion logs to the Logstash in­stance, i.e., our gate­way to the ELK stack.

Overview of centralized logging with ELK

App logging configuration

The ap­proach de­scribed here is based on a File­beat + Logstash con­fig­u­ra­tion, that means first we need to make sure our app logs to a file, whose records will be shipped to Logstash by File­beat. Luck­ily, Vert.x pro­vides the means to con­fig­ure al­ter­na­tive log­ging frame­works (e.g., Log4j, Log4j2 and SLF4J) be­sides the de­fault JUL log­ging. How­ever, we can use File­beat in­de­pen­dently of the log­ging frame­work cho­sen.

Log4j Logging

The demo that ac­com­pa­nies this post re­lies on Log4j2 as the log­ging frame­work. We in­structed Vert.x to use this frame­work fol­low­ing the guide­lines and we made sure our log­ging calls are made asyn­chro­nous, since we don’t want them to block our ap­pli­ca­tion. For this pur­pose, we opted for the AsyncAppender and this was in­cluded in the Log4J con­fig­u­ra­tion to­gether with the log out­put for­mat de­scribed in a XML con­fig­u­ra­tion avail­able in the ap­pli­ca­tion’s Re­source folder.

<Configuration>
  <Appenders>
    <RollingFile name="vertx_logs" append="true" fileName="/var/log/vertx.log" filePattern="/var/log/vertx/$${date:yyyy-MM}/vertx-%d{MM-dd-yyyy}-%i.log.gz">
      <PatternLayout pattern="%d{ISO8601} %-5p %c:%L - %m%n" />
    </RollingFile>
    <Async name="vertx_async">
      <AppenderRef ref="vertx_logs"/>
    </Async>
  </Appenders>
  <Loggers>
    <Root level="DEBUG">
      <AppenderRef ref="vertx_async" />
    </Root>
  </Loggers>
</Configuration>

Filebeat configuration

Now that we have con­fig­ured the log out­put of our Vert.x ap­pli­ca­tion to be stored in the file sys­tem, we del­e­gate to File­beat the task of for­ward­ing the logs to the Logstash in­stance. File­beat can be con­fig­ured through a YAML file con­tain­ing the logs out­put lo­ca­tion and the pat­tern to in­ter­pret mul­ti­line logs (i.e., stack traces). Also, the Logstash out­put plug­in is con­fig­ured with the host lo­ca­tion and a se­cure con­nec­tion is en­forced using the cer­tifi­cate from the ma­chine host­ing Logstash. We set the document_type to the type of in­stance that this log be­longs to, which could later help us while in­dex­ing our logs in­side Elas­tic­search.

filebeat:
  prospectors:
    -
      document_type: trader_dashboard
      paths:
        - /var/log/vertx.log
      multiline:
        pattern: "^[0-9]+"
        negate: true
        match: after
output:
  logstash:
    enabled: true
    hosts:
      - elk:5044
    timeout: 15
    tls:
      insecure: false
      certificate_authoritites:
        - /etc/pki/tls/certs/logstash-beats.crt

ELK configuration

To take fully ad­van­tage of the ELK stack with re­spect to Vert.x and our app logs, we need to con­fig­ure each of its in­di­vid­ual com­po­nents, namely Logstash, Elas­tic­search and Kibana.

Logstash

Logstash is the com­po­nent within the ELK stack that is in charge of ag­gre­gat­ing the logs from each of the sources and for­ward­ing them to the Elas­tic­search in­stance.
Con­fig­ur­ing Logstash is straight­for­ward with the help of the spe­cific input and out­put plu­g­ins for Beats and Elas­tic­search, re­spec­tively. In the pre­vi­ous sec­tion we men­tioned that File­beat could be eas­ily cou­pled with Logstash. Now, we see that this can be done by just spec­i­fy­ing Beat as the input plug­in and set the pa­ra­me­ters needed to be reached by our ship­pers (lis­ten­ing port, ssl key and cer­tifi­cate lo­ca­tion).

input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/pki/tls/certs/logstash-beats.crt"
    ssl_key => "/etc/pki/tls/private/logstash-beats.key"
  }
}

Now that we are ready to re­ceive logs from the app, we can use Logstash fil­ter­ing ca­pa­bil­i­ties to spec­ify the for­mat of our logs and ex­tract the fields so they can be in­dexed more ef­fi­ciently by Elas­tic­search.
The grok fil­ter­ing plug­in comes handy in this sit­u­a­tion. This plug­in al­lows to de­clare the logs for­mat using pre­de­fined and cus­tomized pat­terns based in reg­u­lar ex­pres­sions al­low­ing to de­clare new fields from the in­for­ma­tion ex­tracted from each log line. In the fol­low­ing block, we in­struct Logstash to rec­og­nize our Log4j pat­tern in­side a message field, which con­tains the log mes­sage shipped by File­beat. After that, the date fil­ter­ing plug­in parses the timestamp field ex­tracted in the pre­vi­ous step and re­places it for the one set by File­beat after read­ing the log out­put file.

filter {
  grok {
    break_on_match => false
    match =>  [ "message", "%{LOG4J}"]
  }
  date{
    match => [ "timestamp_string", "ISO8601"]
    remove_field => [ "timestamp_string" ]
  }
}

The Log4j pat­tern is not in­cluded within the Logstash con­fig­u­ra­tion, how­ever, we can spec­ify it using pre­de­fined data for­mats shipped with Logstash and adapt it to the spe­cific log for­mats re­quired in our ap­pli­ca­tion, as shown next.

# Pattern to match our Log4j format
SPACING (?:[\s]+)
LOGGER (?:[a-zA-Z$_][a-zA-Z$_0-9]*\.)*[a-zA-Z$_][a-zA-Z$_0-9]*
LINE %{INT}?
LOG4J %{TIMESTAMP_ISO8601:timestamp_string} %{LOGLEVEL:log_level}%{SPACING}%{LOGGER:logger_name}:%{LINE:loc_line} - %{JAVALOGMESSAGE:log_message}

Fi­nally, we take a look at Logstash’s out­put con­fig­u­ra­tion. This sim­ply points to our elas­tic­search in­stance, in­structs it to pro­vide a list of all clus­ter nodes (sniffing), de­fines the name pat­tern for our in­dices, as­signs the doc­u­ment type ac­cord­ing to the meta­data com­ing from File­beat, and al­lows to de­fine a cus­tom index tem­plate for our data.

output {
  elasticsearch {
    hosts => ["localhost"]
    sniffing => true
    manage_template => true
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
    template => "/etc/filebeat/vertx_app_filebeat.json"
    template_overwrite => true
  }
}

Elasticsearch

Elas­tic­search is the cen­tral com­po­nent that en­ables the ef­fi­cient in­dex­ing and real-​time search ca­pa­bil­i­ties of the stack. To take the most ad­van­tage of Elas­tic­search, we can pro­vide an in­dex­ing tem­plate of our in­com­ing logs, which can help to op­ti­mize the data stor­age and match the queries is­sued by Kibana at a later point.
In the ex­am­ple below, we see an index tem­plate that would be ap­plied to any index match­ing the pat­tern filebeat-*. Ad­di­tion­ally, we de­clare our new log fields type, host, log_level, logger_name, and log_message, which are set as not_analyzed ex­cept for the last two that are set as analyzed al­low­ing to per­form queries based on reg­u­lar ex­pres­sions and not re­stricted to query the full text.

{
  "mappings": {
    "_default_": {
      "_all": {
        "enabled": true,
        "norms": {
          "enabled": false
        }
      },
      "dynamic_templates": [
        {
          "template1": {
            "mapping": {
              "doc_values": true,
              "ignore_above": 1024,
              "index": "not_analyzed",
              "type": "{dynamic_type}"
            },
            "match": "*"
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "offset": {
          "type": "long",
          "doc_values": "true"
        },
        "type": { "type": "string", "index": "not_analyzed" },
        "host": { "type": "string", "index": "not_analyzed" },
        "log_level": { "type": "string", "index": "not_analyzed" },
        "logger_name": { "type": "string", "index": "analyzed" },
        "log_message": { "type": "string", "index": "analyzed" }
      }
    }
  },
  "settings": {
    "index.refresh_interval": "5s"
  },
  "template": "filebeat-*"
}

Kibana

Al­though we could fetch all our logs from Elas­tic­search through its API, Kibana is a pow­er­ful tool that al­lows a more friendly query and vi­su­al­iza­tion. Be­sides the op­tion to query our data through the avail­able in­dexed field names and search boxes al­low­ing typ­ing spe­cific queries, Kibana al­lows cre­at­ing our own Vi­su­al­iza­tions and Dash­boards. Com­bined, they rep­re­sent a pow­er­ful way to dis­play data and gain in­sight in a cus­tomized man­ner. The ac­com­pa­nied demo ships with a cou­ple of sam­ple dash­boards and vi­su­al­iza­tions that take ad­van­tage of the log fields that we spec­i­fied in our index tem­plate and throw valu­able in­sight. This in­cludes: vi­su­al­iz­ing the num­ber of log mes­sages re­ceived by ELK, ob­serve the pro­por­tion of mes­sages that each log source pro­duces, and di­rectly find out the sources of error logs.

Kibana Dashboard" width="550

Log shipping challenge

The so­lu­tion pre­sented here re­lied on File­beat to ship log data to Logstash. How­ever, if you are fa­mil­iar with the Log4j frame­work you may be aware that there ex­ists a Sock­e­tAp­pen­der that al­lows to write log events di­rectly to a re­mote server using a TCP con­nec­tion. Al­though in­clud­ing the File­beat + Logstash com­bi­na­tion may sound an un­nec­es­sary over­head to the log­ging pipeline, they pro­vide a num­ber of ben­e­fits in com­par­i­son to the Log4j socket al­ter­na­tive:

  • The Sock­e­tAp­pen­der re­lies on the spe­cific se­ri­al­iza­tion of Log4j’s Lo­gEvent ob­jects, which is no an in­ter­change­able for­mat as JSON, which is used by the Beats so­lu­tion. Al­though there are at­tempts to out­put the logs in a JSON for­mat for Logstash, it doesn’t sup­port mul­ti­line logs, which re­sults in mes­sages being split into dif­fer­ent events by Logstash. On the other hand, there is no of­fi­cial nor sta­ble input plugin for Log4j ver­sion 2.
  • While en­abling Log4j’s async log­ging mode in an ap­pli­ca­tion del­e­gates log­ging op­er­a­tions to sep­a­rate threads, given their co­ex­is­tence in the same JVM there is still the risk of data loss in case of a sud­den JVM ter­mi­na­tion with­out proper log chan­nel clos­ing.
  • File­beat is a data ship­per de­signed to deal with many con­straints that arise in dis­trib­uted en­vi­ron­ments in a re­li­able man­ner, there­fore it pro­vides op­tions to tai­lor and scale this op­er­a­tion to our needs: the pos­si­bil­ity to load bal­ance be­tween mul­ti­ple Logstash in­stances, spec­ify the num­ber of si­mul­ta­ne­ous File­beat work­ers that ship log files, and spec­ify a com­pres­sion level in order to re­duce the con­sumed band­width. Be­sides that, logs can be shipped in spe­cific batch sizes, with max­i­mum amount of re­tries, and spec­i­fy­ing a con­nec­tion time­out.
  • Lastly, al­though File­beat can for­ward logs di­rectly to Elas­tic­search, using Logstash as an in­ter­me­di­ary of­fers the pos­si­bil­ity to col­lect logs from di­verse sources (e.g., sys­tem met­rics).

Demo

This post is ac­com­pa­nied by a demo based on the Vert.x Mi­croser­vices work­shop, where each of them is shipped in a Docker con­tainer sim­u­lat­ing a dis­trib­uted sys­tem com­posed of in­de­pen­dent ad­dress­able nodes.
Also, the ELK stack is pro­vi­sioned using a pre­con­fig­ured Docker image by Sébastien Pu­jadas.

Fol­low­ing the guide­lines in this post, this demo con­fig­ures each of the Mi­croser­vices of the work­shop, sets up a File­beat process on each of them to ship the logs to a cen­tral con­tainer host­ing the ELK stack.

Installation

In order to run this demo, it is nec­es­sary to have Docker in­stalled, then pro­ceed with:

  • Cloning or down­load­ing the demo repos­i­tory.
  • Sep­a­rately, ob­tain­ing the source code of the branch of the Mi­croser­vices work­shop adapted for this demo.

Building the example

The Docker im­ages be­long­ing to the Vert.x Mi­croser­vices work­shop need to be built sep­a­rately to this project be­fore this project can be launched.

Building the Vert.x Microservices workshop Docker images.

Build the root project and the Trader Dash­board fol­lowed by each of the mod­ules con­tained in the so­lu­tion folder. Issue the fol­low­ing com­mands for this:

mvn clean install
cd trader-dashboard
mvn package docker:build
cd ../solution/audit-service
mvn package docker:build
cd ../compulsive-traders
mvn package docker:build
cd ../portfolio-service
mvn package docker:build
cd ../quote-generator/
mvn package docker:build

Running the example

After build­ing the pre­vi­ous im­ages, build and run the ex­am­ple in vertx-elk using the fol­low­ing com­mand:

docker-compose up

The demo

You can watch the demo in ac­tion in the fol­low­ing screen­cast:

Conclusion

The ELK stack is a pow­er­ful set of tools that ease the ag­gre­ga­tion of logs com­ing from dis­trib­uted ser­vices into a cen­tral server. Its main pil­lar, Elas­tic­search, pro­vides the in­dex­ing and search ca­pa­bil­i­ties of our log data. Also, it is ac­com­pa­nied by the con­ve­nient input/out­put com­po­nents: Logstash, which can be flex­i­bly con­fig­ured to ac­cept dif­fer­ent data sources; and Kibana, which can be cus­tomized to present the in­for­ma­tion in the most con­ve­nient way.

Logstash has been de­signed to work seam­lessly with File­beat, the log ship­per which rep­re­sents a ro­bust so­lu­tion that can be adapted to our ap­pli­ca­tions with­out hav­ing to make sig­nif­i­cant changes to our ar­chi­tec­ture. In ad­di­tion, Logstash can ac­cept var­ied types of sources, fil­ter the data, and process it be­fore de­liv­er­ing to Elas­tic­search. This flex­i­bil­ity comes with the price of hav­ing extra el­e­ments in our log ag­gre­ga­tion pipeline, which can rep­re­sent an in­crease of pro­cess­ing over­head or a point-​of-failure. This ad­di­tional over­head could be avoided if an ap­pli­ca­tion would be ca­pa­ble of de­liv­er­ing its log out­put di­rectly to Elas­tic­search.

Happy log­ging!

Next post

Vert.x 3.3.3 is released!

We have just released Vert.x 3.3.3, a bug fix release of Vert.x 3.3.x.

Read more
Previous post

Vert.x Blueprint Tutorials

The Vert.x Blueprint project aims to provide guidelines to Vert.x users to implement various applications such as message-based applications and microservices.

Read more
Related posts

Getting started with new fabric8 Vert.x Maven Plugin

The all new fabric8 Vert.x Maven Plugin allows you to setup, package, run, start, stop and redeploy easily with a very little configuration resulting in a less verbose pom.xml.

Read more

Vert.x featuring Continuous Delivery with Jenkins and Ansible

This blog entry describes an approach to adopt Continuous Delivery for Vert.x applications using Jenkins and Ansible by taking advantage of the Jenkins Job DSL and Ansible plugins.

Read more

Unit and Integration Tests

Let’s refresh our mind about what we developed so far in the introduction to vert.x series. We forgot an important task. We didn’t test the API.

Read more