Percona Monitoring and Management, Meet Prometheus Alertmanger

One of the requests we get most often on the Percona Monitoring and Management (PMM) team is “Do you support alerting?”  The answer to that question has always been “Yes” but the feedback on how we offered it natively was that it was, well, not robust enough!  We’ve been hard at work to change that and are excited to offer, starting with the newly releasedPMM version 2.3.0, a more dynamic alerting mechanism for your PMM installations: Integration with Prometheus Alertmanager.

Prometheus Alertmanager

If you don’t know what Alertmanager is you can read all about it on the Prometheus website , but the short version is that Alertmanger is a receiver, consolidator, and router of alerting messages that offers LOTS of flexibility when it comes to configurations.  From my old days as a SysAdmin, the tools I used weren’t smart enough to deduplicate alerts so I’d have my boss yelling, my coworkers emailing, and my phone (ok…Blackberry) battery depleting itself vibrating to the same alert over and over until I could manage to put the alert in maintenance mode and the queue of alerts drained.  Alertmanager is smart enough to deduplicate alerts so you don’t get 50 pages telling you the disk is 90% full before you can grow the volume or purge files. It’s also extremely easy to group alerts so that you don’t get alerts for ‘Application Down’, ‘MySQL Down’, ‘CPU|RAM|Disk: Unavail’, etc. because someone rebooted the DB server without putting monitoring in maintenance mode.  Alertmanager also offers many native integrations so you can route alerts to email, SMS, PagerDuty, Slack, and more!

Now, this is our first iteration of Alertmanager support so at this point you will need your own working Alertmanager installation that your PMM server can communicate with.  The only other thing you’ll need are the rules you want to trigger alerts from. That’s basically it! You most likely already know how to create yaml style rules but for the curious, it looks something like this:

- alert: PostgresqlDown
  expr: pg_up == 0
  for: 5m
    severity: error
    summary: "PostgreSQL down (instance {{ $labels.service_name }})"
    description: "PostgreSQL instance is down\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"

The above will trigger an alert to let you know which PMM instances of PostgreSQL are down for more than 5 minutes.  Since this first pass targets the experienced users, I’ll leave it to you to craft your own rules but we’re really excited to be adding this sorely needed functionality!

For more information, you can read our AlertManager integration documentation andFAQs.  Update your instance today and let us know what you think, we would love to hear your feedback!