Prometheus is a popular open-source monitoring system and timeseries database. One of the main architectural advantages of Prometheus is that it (typically) scrapes metrics from the systems it’s monitoring. It’s a pull system. This is more robust and flexible than, say, telemetry ingest where clients/services push data to a RESTful endpoint.

Neo4j Enterprise has a built-in Prometheus endpoint that can be enabled by setting a couple of properties in neo4j.conf.

In this article we’ll walk-through the setup of Prometheus to capture metrics from a Neo4j database, and the host it’s running on. We’ll also setup Grafana in order to visualize some of those metrics in a dashboard.

A compelling reason to use Prometheus/Grafana is that the Neo4j database doesn’t exist in isolation. Various applications might read from and write to the Neo4j database. And the database itself runs on a Linux host, which provides CPU, RAM, disk, etc. From a dev-ops perspective, it’s important that we can correlate events across systems.

In addition, a production system will integrate with an incident management service such as PagerDuty, VictorOps, or OpsGenie. Prometheus has integrations with these solutions.

If alerting isn’t a requirement, and we only want to visibility to Neo4j’s internals, Halin would be a much better fit.

Grafana, Prometheus, and Neo4j

prepare the Prometheus host

Neo4j was deployed on an AWS EC2 instance in the us-west-1 region. Although AWS has both Prometheus and Grafana services, they were not available in that region at the time of writing (2021-06-04).

We spun-up a t2.large instance (2 vCPU’s; 8GB of RAM) to host Prometheus and Grafana. We added a 100GB EBS volume for the Prometheus database data. It’s important that the Prometheus timeseries database has enough storage.

For convenience, we added the following properties to our local ~/.ssh.config file:

Host woolford-prometheus
  HostName 52.52.149.133
  User ec2-user
  IdentityFile /Users/alexwoolford/.ssh/awoolford_neo4j_west.pem

This allows us to connect to the Prometheus host without the need to specify the IP address, username, and PEM key. This allowed us to login to the instance with a simple ssh woolford-prometheus.

First, we ran the lsblk command to determine the name of the EBS device (i.e. xvdb):

#  lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0    0    8G  0 disk 
└─xvda1 202:1    0    8G  0 part /
xvdb    202:16   0  100G  0 disk 

We then created an xfs filesystem on that storage device:

# mkfs.xfs /dev/xvdb

By default, Prometheus stores its timeseries database in /var/lib/prometheus. We created that mount point:

# mkdir /var/lib/prometheus

To permanently mount the 100GB EBS volume to /var/lib/prometheus, we added an entry to /etc/fstab. To create that entry, we first had to get the UUID (universally unique identifier) for the xvdb block device:

# blkid
/dev/xvda1: LABEL="/" UUID="7b355c6b-f82b-4810-94b9-4f3af651f629" TYPE="xfs" PARTLABEL="Linux" PARTUUID="a5dcc974-1013-4ea3-9942-1ac147266613"
/dev/xvdb: UUID="82e76047-f822-4333-961e-67826af34721" TYPE="xfs"

This device was permanently mounted to /var/lib/prometheus by adding the following line to /etc/fstab:

UUID=82e76047-f822-4333-961e-67826af34721  /var/lib/prometheus  xfs  defaults,nofail  0  2

We rebooted the instance and confirmed that the 100GB drive had been mounted:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
...
/dev/xvda1      8.0G  1.6G  6.5G  20% /
/dev/xvdb       100G  135M  100G   1% /var/lib/prometheus

Ansible setup

Ansible install

Cloud Alchemy has made production-grade Prometheus/Grafana setup very simple, via Ansible. We first install Cloud Alchemy’s Ansible roles for Prometheus, the Prometheus Node Exporter, and Grafana on the deployment host (i.e. my laptop):

ansible-galaxy install cloudalchemy.prometheus
ansible-galaxy install cloudalchemy.node_exporter
ansible-galaxy install cloudalchemy.grafana

We added the Neo4j and Prometheus hosts to the local Ansible inventory by adding a group called woolford in our /etc/ansible/hosts file:

all:
    ...
    woolford:
      hosts:
        woolford-prometheus:
        woolford-neo4j:

We then confirmed Ansible connectivity using the Ansible ping module:

% ansible woolford -m ping
woolford-prometheus | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": false,
    "ping": "pong"
}
woolford-neo4j | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": false,
    "ping": "pong"
}

Both hosts responded with a “pong”. That’s good.

enable the Neo4j Prometheus endpoint

Neo4j Enterprise has built-in Prometheus endpoint that be enabled by adding the following properties to neo4j.conf.

metrics.prometheus.enabled=true
metrics.prometheus.endpoint=0.0.0.0:2004

The neo4j service was restarted. We confirmed that endpoint was accessible from the Prometheus host using (httpie)[https://httpie.io/]:

$ http 172.31.5.57:2004/metrics

neo4j_system_check_point_total_time_total 0.0
neo4j_bolt_connections_opened_total 0.0
...

Pro-tip: if you use curl, consider using httpie instead.

install the Prometheus Node Exporter

The Prometheus Node Exporter exposes hardware and kernel metrics from the host. The Ansible-based installation is far less error-prone than completing all the steps by-hand.

To install the node exporter Neo4j host we created an Ansible playbook, prometheus-node-exporter-install.yml containing the following YAML:

- hosts: woolford-neo4j
  roles:
    - cloudalchemy.node_exporter

When we first attempted to install the Node Exporter, we ran into an issue where “Python quit unexpectedly.” A quick Google search suggested that we set the following environment variable before running the playbooks:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

We added this one-liner to ~/.zshrc so this environment variable is always set when running CLI commands.

We then ran the playbook:

ansible-playbook prometheus-node-exporter-install.yml

The installation was successful. We confirmed that the node exporter was working by making an HTTP call to the endpoint on the Neo4j host (port 9100):

$ http 172.31.5.57:9100/metrics
node_disk_io_now{device="nvme0n1"} 0
node_disk_io_time_seconds_total{device="nvme0n1"} 7672.7
node_disk_read_bytes_total{device="nvme0n1"} 9.12474983424e+11
...

Now that there are a couple of metric-producing endpoints to scrape, we installed Prometheus. Like we did with the Node Exporter installation, we created an Ansible playbook, prometheus-install.yml, containing the following:

- hosts: woolford-prometheus
  roles:
    - cloudalchemy.prometheus
  vars:
    prometheus_targets:
      node:
      - targets:
        - 172.31.5.57:9100
        - 172.31.5.57:2004
        labels:
          env: woolford-neo4j

Note that we specify the target endpoints in the Neo4j instance. We used the internal IP address of the Neo4j instance to avoid unnecessary ingress/egress charges since both the Neo4j instance and Prometheus reside in the same VPC.

We then ran the playbook:

ansible-playbook prometheus-install.yml

In order to connect to Prometheus from outside the VPC, it was necessary to create an inbound rule from our IP to port 9090. Once we created that rule, we could access the [somewhat limited] Prometheus GUI and view the metrics from the Neo4j host.

install Grafana

We then installed Grafana using an Ansible playbook, grafana-install.yml:

- hosts: woolford-prometheus
  roles:
    - role: cloudalchemy.grafana
      vars:
        grafana_security:
          admin_user: neo4j
          admin_password: neo4j

We ran that playbook:

ansible-playbook install-grafana.yml

In order to access Grafana, we added another inbound rule to the security group so that our IP could access port 3000 of the Prometheus host.

We added our Prometheus host as a datasource in Grafana. This allowed us to create a near real-time dashboard containing metrics from Neo4j and the host operating system.

… in closing

Prometheus gathers metrics and alerts

Prometheus is a powerful monitoring solution that can capture metrics from various systems, e.g. databases, messaging, storage, API’s (see the Prometheus exporters and integrations page for a complete list). The ability to visually correlate timeseries data is important for engineers who manage solutions that integrate a variety of technologies. Prometheus also provides an easy integration point for incident management services (e.g. PagerDuty, VictorOps, OpsGenie, etc.) which is considered a table stake for complex production environments.

If you’re not already using Prometheus, it’s extremely robust and simple to setup with Ansible. Neo4j Enterprise’s built-in endpoint makes it a breeze to start capturing those metrics.