Reviewing some case studies

This section discusses some real-world scenarios of Elasticsearch node failure and how to address them.

The ES process quits unexpectedly

A few weeks ago we noticed in Marvel that the Elasticsearch process was down on one of our nodes. We restarted Elasticsearch on this node, and everything seemed to return to normal. However, checking Marvel later on in the week, we notice that the node is down again. We decide to look at the Elasticsearch log files, but don't notice any exceptions. As we don't see anything in the Elasticsearch log, we suspect that the operating system may have killed Elasticsearch. Checking syslog at /var/log/syslog, we see the error:

Out of memory: Kill process 5969 (java) score 446 or sacrifice child ...

Get Monitoring Elasticsearch now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.