Database Monitoring Built by DBAs, Not Dashboard Designers

Table of Contents

We recently came in as consultants to a startup with PostgreSQL performance problems. They didn’t have a DBA on staff - not unusual for a company their size. They had monitoring: some CloudWatch dashboards, a DataDog subscription. But the data was scattered across tools, and none of it told the complete story.

They knew something was wrong. They didn’t have the visibility to know what.

This is a common pattern. Companies invest in monitoring, but they invest in infrastructure monitoring-CPU, memory, disk, network. Generic tools that treat a database the same as any other service. When performance degrades, these dashboards show symptoms, not causes.

We deployed PMM alongside their existing tools. Not to replace everything, but to fill the gaps. Within hours, we had the visibility we needed to diagnose the problems and fix them.

PMM wasn’t the hero of the story. It was the tool that completed our investigation.

The Percona Difference

Years ago, I worked as a Percona consultant. When I arrived at a customer site, the first thing I did was collect a snapshot of their environment: hardware audit, software audit, database settings, query samples, unused indexes, and so on. Each of us had our own tools and scripts to gather this information. Over time, you built up a personal toolkit for diagnosing database problems.

PMM consolidated that knowledge into a product. The dashboards aren’t random metrics someone thought looked interesting. They’re the same things we collected manually during consulting engagements-now automated, continuous, and presented in a way that leads to action. If a graph made it into PMM, it’s because someone needed that information to solve a real problem. If it wasn’t actionable, it didn’t make the cut.

PMM’s strength comes from Percona’s background. The company literally wrote the book on running databases in production. The people building PMM have decades of consulting experience, parachuting into broken production systems and finding problems fast. That expertise shows in every dashboard.

If you’re evaluating PMM, you already know whether it fits your needs. The question isn’t “should I use PMM?” - it’s “how do I run PMM properly in production?”

From Useful Tool to Production Deployment

PMM is a great tool. But there’s a gap between “great tool” and “running in production.”

Percona offers an AWS Marketplace AMI. You can run Docker on an EC2 instance. You can set up Kubernetes. Each approach works, but each leaves you with homework: Where does the data live? How do you back it up? What happens when the instance dies? How do you get HTTPS with a valid certificate?

These aren’t hard problems, but they take time. And if you skip them, you end up with a monitoring system that can’t survive an EC2 hardware failure, or that loses all your metrics when someone accidentally terminates the wrong instance.

We built a Terraform module that handles all of it.

What the Module Provides

One terraform apply and you have a production-ready PMM deployment:

Persistent storage. PMM data lives on a dedicated EBS volume, separate from the root filesystem. Encrypted at rest with KMS (AWS-managed or customer-managed keys), so your auditors are happy. Your metrics survive instance replacements, reboots, and Docker container restarts.

Daily backups. AWS Backup snapshots the EBS volume every day at 5 AM UTC. Default retention is 30 days. If something goes wrong, you restore from a snapshot and lose at most 24 hours of metrics.

Auto-recovery. CloudWatch monitors EC2 health checks. If the underlying hardware fails, AWS automatically recovers the instance. Typical recovery time is 2-5 minutes, and you don’t get paged for it.

HTTPS with valid certificates. An Application Load Balancer terminates TLS using an ACM certificate. Your team accesses PMM at https://pmm.yourcompany.com with a certificate that browsers trust.

Monitoring for the monitoring tool. The PMM instance gets CloudWatch alarms for disk space, memory usage, EBS burst balance, and instance health. If your monitoring system is unhealthy, you’ll know before it affects your visibility.

RDS access configured automatically. Pass your RDS security group IDs, and the module adds the ingress rules PMM needs to connect. No manual security group editing.

The Architecture

┌─────────────────────────────────────────────────────────────┐
│                         Internet                            │
└────────────────────────┬────────────────────────────────────┘
                    ┌────▼────┐
                    │   ALB   │  HTTPS with ACM Certificate
                    └────┬────┘
        ┌────────────────▼────────────────────┐
        │   EC2 Instance (Ubuntu Pro 24.04)        │   ┌──────────────────────────────┐  │
        │   │  PMM Docker Container        │  │
        │   │  └─ Grafana, ClickHouse,     │  │
        │   │     PostgreSQL, Prometheus   │  │
        │   └──────────────────────────────┘  │
        │   ┌──────────────────────────────┐  │
        │   │  EBS Volume (100GB GP3)      │  │
        │   │  Persistent metric storage   │  │
        │   └──────────────────────────────┘  │
        └─────────────────────────────────────┘

PMM runs as a Docker container managed by systemd. Metrics go to the EBS volume mounted at /srv. The ALB handles HTTPS termination and health checks. AWS Backup handles snapshots. CloudWatch handles alarms.

All the pieces you’d eventually build yourself-already wired together and tested.

Deployment

Here’s a minimal configuration:

module "pmm" {
  source  = "infrahouse/pmm-ecs/aws"
  version = "~> 1.1"

  public_subnet_ids  = ["subnet-abc123", "subnet-def456"]
  private_subnet_ids = ["subnet-ghi789"]

  zone_id      = "Z1234567890ABC"
  dns_names    = ["pmm"]
  environment  = "production"
  alarm_emails = ["oncall@yourcompany.com"]

  # Allow PMM to connect to your RDS instances
  rds_security_group_ids = [aws_security_group.rds.id]
}

Run terraform apply, wait about 8 minutes, and PMM is live at https://pmm.yourdomain.com.

Retrieve your admin password from Secrets Manager:

aws secretsmanager get-secret-value \
  --secret-id pmm-server-admin-password \
  --query SecretString --output text

Extending PMM with Custom Queries

PMM lets you collect additional metrics through custom SQL queries. The module supports this directly:

module "pmm" {
  source  = "infrahouse/pmm-ecs/aws"
  version = "~> 1.1"

  # ... other configuration ...

  postgresql_custom_queries_medium_resolution = file("${path.module}/queries/pg-activity.yml")
  postgresql_custom_queries_low_resolution    = file("${path.module}/queries/pg-indexes.yml")
}

For example, we added this query to track connection states and blocked processes at the client I mentioned earlier:

pg_activity:
  query: |
    SELECT datname, state, wait_event_type, wait_event,
           COUNT(*) as processes,
           MAX(extract(epoch from clock_timestamp() - xact_start))
             FILTER (WHERE xact_start IS NOT NULL
                     AND query !~* '.*autovacuum.*') as in_transaction_seconds,
           COUNT(*) FILTER (WHERE cardinality(pg_blocking_pids(pid)) > 0) as blocked
    FROM pg_stat_activity
    WHERE datname !~ '^(postgres|rdsadmin|template(0|1))$'
      AND pid <> pg_backend_pid()
    GROUP BY datname, state, wait_event_type, wait_event    

The data flows into the same dashboards and alerting system as built-in metrics.

The Cost

PMM is open source. The infrastructure to run it costs about $111/month:

ComponentMonthly Cost
EC2 m5.large (on-demand)$70
Application Load Balancer$23
EBS GP3 Storage (100GB)$8
EBS Snapshots (daily backups)$5
CloudWatch Logs & Metrics$5
Total$111

This is a flat cost regardless of how many databases you monitor. If you’re running a dozen PostgreSQL instances, PMM at $111/month beats per-host pricing models quickly.

A 1-year Reserved Instance drops the EC2 cost to ~$42/month, bringing the total to around $83/month.

Getting Started

The module is on the Terraform Registry:

git clone https://github.com/infrahouse/terraform-aws-pmm-ecs
cd terraform-aws-pmm-ecs/examples/with-rds-monitoring

The repository includes deployment examples, sample custom queries, troubleshooting guides, and runbooks.


Summary

PMM is a useful tool built by people who understand database problems. It fills gaps that generic monitoring tools leave behind.

The InfraHouse Terraform module takes that useful tool and makes it production-ready on AWS: persistent storage, daily backups, auto-recovery, HTTPS, and monitoring for the monitoring system itself. One terraform apply, and you’re done.


InfraHouse builds production-ready Terraform modules for AWS. The terraform-aws-pmm-ecs module is open source under the Apache 2.0 license.

Related Posts

From Keycloak to Cognito: Building a Self-Hosted Terraform Registry on AWS

A practical engineering story about replacing Keycloak with Cognito to create a self-hosted Terraform registry using Tapir, AWS ECS, and ALB - a simpler, cost-efficient, and fully reproducible setup.

Read More

Implementing Compliant Secrets with AWS Secrets Manager

I had a conversation with a colleague other day, and he asked who has access to a specific password. We use AWS Secrets Manager to store secret data and AWS Identity and Access Management to control access to it. Seemingly simple question, it was difficult to answer. I started off with describing how an IAM role can have particular permissions on a particular secret, etc. Pretty soon, I realized, that to answer what roles can read a secret, one would need to parse every available IAM policy.

Read More

Three Days, Two Developers: How AI Pair Programming Transformed Good Code into Excellence

Discover how InfraHouse transformed a routine Lambda module into production excellence through disciplined AI collaboration. Same timeline, exponentially better outcome-including ISO 27001 compliance, comprehensive testing, and security patterns discovered after years of experience.

Read More