Introduction

In the world of database observability, timing is everything — literally.

Recently, we ran into a puzzling issue while managing MySQL monitoring in Percona Monitoring and Management (PMM). Out of five database servers (server1 through server4), one server i.e. server4 stubbornly refused to show metrics for time ranges less than 24 hours. When we selected “Last 24 hours”, everything looked fine. But when we switched to “Last 12 hours” or “Last 6 hours”, PMM threw up the dreaded message: “No data”.

At first glance, this seemed like a PMM agent issue or perhaps a problem in Prometheus scraping. But as we soon discovered, the real culprit was far simpler and far sneakier: system time drift.

In this blog, I’ll walk you through how we diagnosed and resolved this issue, what caused it, and why NTP (Network Time Protocol) deserves a permanent spot in your DevOps checklist. We’ll also discuss preventive strategies and observability best practices to ensure you never lose metrics again due to time mismatch.

The Setup: PMM for MySQL Monitoring

Server1, Server2, Server3 and Server4.

Each host runs:

  • MySQL 8.x
  • PMM Agent configured with the node and MySQL exporter
  • Metrics collected by Prometheus (PMM Server)
  • Dashboards viewed via Grafana

For months, everything worked flawlessly, until Server4 went silent for shorter time ranges.

The Symptom

When opening the PMM dashboard:

  • Last 24 hours: Data visible
  • Last 12 hours / 6 hours / 1 hour: “No data”

Everything else (queries, connections, replication, etc.) appeared normal. The PMM agent was running fine, and no scrape errors were visible in Prometheus targets.

Here’s what the situation looked like:

ServerPMM data (24h)PMM data (6h)Status
Server1VisibleVisibleOk
Server2VisibleVisibleOk
Server3VisibleVisibleOk
Server4VisibleNo dataProblem

Initial Hypothesis

Whenever PMM shows partial or missing data, there are usually a few usual suspects:

  • PMM Agent not running properly
  • Exporter stopped or stale metrics
  • Prometheus scrape failure
  • Network delay or firewall issue
  • Time synchronization mismatch

Since PMM was showing 24-hour data, we could immediately rule out exporters and scraping.
That left us with one interesting clue: time.

Checking the Time

On the problematic server (Server4):

[admin@sgpserver4 ~]$ date
Thu Jul 18 05:38:18 AM IST 2024

[admin@sgpserver4 ~]$ timedatectl status
               Local time: Thu 2024-07-18 11:40:48 IST
           Universal time: Thu 2024-07-18 06:10:48 UTC
                 RTC time: Mon 2001-01-01 14:40:52
                Time zone: Asia/Kolkata (IST, +0530)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no

That “System clock synchronized: no” caught our attention — and the RTC time from 2001 was a dead giveaway.

On another healthy server (Server2):

[admin@sgpserver2 ~]$ date
Tue Oct 28 09:11:30 AM IST 2025

Server4 was almost 4 hours behind compared to other servers. That explained why 24-hour graphs worked: the metric timestamps were still within Prometheus retention windows. But when selecting the last few hours, PMM’s query window didn’t match the drifted timestamps — hence, “No data”.

Root Cause: System Time Drift

Time drift occurs when a system’s internal clock runs out of sync with real-world time — usually because the NTP service isn’t active or functioning properly.

In this case:

  • The system clock was out of sync by ~4 hours.
  • PMM’s Prometheus recorded metrics with incorrect timestamps.
  • Grafana queries based on current time couldn’t find any data within the selected time range.

Thus, PMM wasn’t broken — it was just looking in the wrong time window.

The Fix: Realigning Time on Server4

To fix the time drift, we followed a safe and reversible approach.

Step 1: Disable automatic NTP sync temporarily

sudo timedatectl set-ntp false

Step 2: Manually set the correct time (matching other servers)

sudo timedatectl set-time "2025-10-28 09:11:30"

Step 3: Re-enable NTP service

sudo timedatectl set-ntp true

Step 4: Sync system time to hardware clock

sudo hwclock --systohc

Step 5: Restart PMM Agent

sudo systemctl restart pmm-agent

Step 6: Verify

timedatectl status
date

Output:

Local time: Tue 2025-10-28 09:11:30 IST
System clock synchronized: yes
NTP service: active

Verification in PMM

After correction:

Time RangeData Visibility
Last 24 hoursVisible
Last 12 hoursVisible
Last 6 hoursVisible
Last 1 hourVisible

We monitored for a few hours to confirm consistency — everything was stable.

Why Time Sync Is Critical in PMM (and Monitoring in General)

Time synchronization isn’t just a nice-to-have; it’s the backbone of distributed observability.

Here’s why:

  • Prometheus Relies on Accurate Timestamps:
    Each metric is stored with a timestamp. If one host’s clock drifts, metrics appear “out of order” or “in the future/past,” breaking visual continuity.
  • Grafana Query Windows Depend on Current Time:
    Dashboards query “last X hours” relative to now. If your system’s “now” is wrong, your data disappears.
  • Alert Rules May Misfire:
    Alerts using rate() or increase() functions assume monotonic timestamps. A time drift can lead to false positives or missed alerts.
  • Cluster Coordination Breaks:
    In replication, distributed locking, or orchestrators — inconsistent time can cause failover delays, wrong transaction ordering, or stale metrics.

How to Prevent Future Time Drift

Enable NTP Across All Servers

sudo timedatectl set-ntp true

Or install Chrony (recommended for servers):

sudo yum install chrony -y
sudo systemctl enable --now chronyd
chronyc sources -v

Standardize Timezone

Set all DB servers to a consistent timezone (e.g., IST):

sudo timedatectl set-timezone Asia/Kolkata

Verify with Automation

Create a daily cron or Ansible check:

timedatectl status | grep "System clock synchronized"

Alert if “no”.

Add Prometheus Alerts

Add an alert rule like:

alert: TimeNotSynced
expr: node_timex_sync_status == 0
for: 10m
labels:
  severity: warning
annotations:
  summary: "System time is not synchronized"
  description: "Host {{ $labels.instance }} has NTP sync disabled"

Include Time Drift in Health Checks

During any DB or PMM troubleshooting, always check time first.
A 10-second command can save hours of debugging.

Lessons Learned

  • Symptom: PMM “No Data” for short time ranges
  • Root Cause: Server’s clock 4 hours behind due to NTP desync
  • Fix: Manual time correction + re-enable NTP
  • Prevention: Standardize NTP and time checks across all servers

Final Checklist

CommandPurpose
timedatectl statusVerify NTP and sync state
sudo timedatectl set-ntp trueEnable NTP
sudo hwclock –systohcSync hardware clock
dateConfirm correct time
systemctl restart pmm-agentRestart PMM agent
PMM → DashboardConfirm data visibility

Conclusion

Monitoring systems like PMM, Prometheus, and Grafana are incredibly powerful — but they rely on one universal truth: time must be right.

This blog is based on a real-world issue faced during MySQL infrastructure monitoring at scale. If you’re working with PMM, Orchestrator, or any Prometheus-based observability stack — remember that the smallest configuration (like a few minutes of clock drift) can have the biggest impact.

Discover more from Genexdbs

Subscribe now to keep reading and get access to the full archive.

Continue reading