Introduction
In the world of database observability, timing is everything — literally.
Recently, we ran into a puzzling issue while managing MySQL monitoring in Percona Monitoring and Management (PMM). Out of five database servers (server1 through server4), one server i.e. server4 stubbornly refused to show metrics for time ranges less than 24 hours. When we selected “Last 24 hours”, everything looked fine. But when we switched to “Last 12 hours” or “Last 6 hours”, PMM threw up the dreaded message: “No data”.
At first glance, this seemed like a PMM agent issue or perhaps a problem in Prometheus scraping. But as we soon discovered, the real culprit was far simpler and far sneakier: system time drift.
In this blog, I’ll walk you through how we diagnosed and resolved this issue, what caused it, and why NTP (Network Time Protocol) deserves a permanent spot in your DevOps checklist. We’ll also discuss preventive strategies and observability best practices to ensure you never lose metrics again due to time mismatch.
The Setup: PMM for MySQL Monitoring
Server1, Server2, Server3 and Server4.
Each host runs:
- MySQL 8.x
- PMM Agent configured with the node and MySQL exporter
- Metrics collected by Prometheus (PMM Server)
- Dashboards viewed via Grafana
For months, everything worked flawlessly, until Server4 went silent for shorter time ranges.
The Symptom
When opening the PMM dashboard:
- Last 24 hours: Data visible
- Last 12 hours / 6 hours / 1 hour: “No data”
Everything else (queries, connections, replication, etc.) appeared normal. The PMM agent was running fine, and no scrape errors were visible in Prometheus targets.
Here’s what the situation looked like:
| Server | PMM data (24h) | PMM data (6h) | Status |
| Server1 | Visible | Visible | Ok |
| Server2 | Visible | Visible | Ok |
| Server3 | Visible | Visible | Ok |
| Server4 | Visible | No data | Problem |
Initial Hypothesis
Whenever PMM shows partial or missing data, there are usually a few usual suspects:
- PMM Agent not running properly
- Exporter stopped or stale metrics
- Prometheus scrape failure
- Network delay or firewall issue
- Time synchronization mismatch
Since PMM was showing 24-hour data, we could immediately rule out exporters and scraping.
That left us with one interesting clue: time.
Checking the Time
On the problematic server (Server4):
[admin@sgpserver4 ~]$ date
Thu Jul 18 05:38:18 AM IST 2024
[admin@sgpserver4 ~]$ timedatectl status
Local time: Thu 2024-07-18 11:40:48 IST
Universal time: Thu 2024-07-18 06:10:48 UTC
RTC time: Mon 2001-01-01 14:40:52
Time zone: Asia/Kolkata (IST, +0530)
System clock synchronized: no
NTP service: active
RTC in local TZ: no
That “System clock synchronized: no” caught our attention — and the RTC time from 2001 was a dead giveaway.
On another healthy server (Server2):
[admin@sgpserver2 ~]$ date
Tue Oct 28 09:11:30 AM IST 2025
Server4 was almost 4 hours behind compared to other servers. That explained why 24-hour graphs worked: the metric timestamps were still within Prometheus retention windows. But when selecting the last few hours, PMM’s query window didn’t match the drifted timestamps — hence, “No data”.
Root Cause: System Time Drift
Time drift occurs when a system’s internal clock runs out of sync with real-world time — usually because the NTP service isn’t active or functioning properly.
In this case:
- The system clock was out of sync by ~4 hours.
- PMM’s Prometheus recorded metrics with incorrect timestamps.
- Grafana queries based on current time couldn’t find any data within the selected time range.
Thus, PMM wasn’t broken — it was just looking in the wrong time window.
The Fix: Realigning Time on Server4
To fix the time drift, we followed a safe and reversible approach.
Step 1: Disable automatic NTP sync temporarily
sudo timedatectl set-ntp false
Step 2: Manually set the correct time (matching other servers)
sudo timedatectl set-time "2025-10-28 09:11:30"
Step 3: Re-enable NTP service
sudo timedatectl set-ntp true
Step 4: Sync system time to hardware clock
sudo hwclock --systohc
Step 5: Restart PMM Agent
sudo systemctl restart pmm-agent
Step 6: Verify
timedatectl status
date
Output:
Local time: Tue 2025-10-28 09:11:30 IST
System clock synchronized: yes
NTP service: active
Verification in PMM
After correction:
| Time Range | Data Visibility |
| Last 24 hours | Visible |
| Last 12 hours | Visible |
| Last 6 hours | Visible |
| Last 1 hour | Visible |
We monitored for a few hours to confirm consistency — everything was stable.
Why Time Sync Is Critical in PMM (and Monitoring in General)
Time synchronization isn’t just a nice-to-have; it’s the backbone of distributed observability.
Here’s why:
- Prometheus Relies on Accurate Timestamps:
Each metric is stored with a timestamp. If one host’s clock drifts, metrics appear “out of order” or “in the future/past,” breaking visual continuity. - Grafana Query Windows Depend on Current Time:
Dashboards query “last X hours” relative to now. If your system’s “now” is wrong, your data disappears. - Alert Rules May Misfire:
Alerts using rate() or increase() functions assume monotonic timestamps. A time drift can lead to false positives or missed alerts. - Cluster Coordination Breaks:
In replication, distributed locking, or orchestrators — inconsistent time can cause failover delays, wrong transaction ordering, or stale metrics.
How to Prevent Future Time Drift
Enable NTP Across All Servers
sudo timedatectl set-ntp true
Or install Chrony (recommended for servers):
sudo yum install chrony -y
sudo systemctl enable --now chronyd
chronyc sources -v
Standardize Timezone
Set all DB servers to a consistent timezone (e.g., IST):
sudo timedatectl set-timezone Asia/Kolkata
Verify with Automation
Create a daily cron or Ansible check:
timedatectl status | grep "System clock synchronized"
Alert if “no”.
Add Prometheus Alerts
Add an alert rule like:
alert: TimeNotSynced
expr: node_timex_sync_status == 0
for: 10m
labels:
severity: warning
annotations:
summary: "System time is not synchronized"
description: "Host {{ $labels.instance }} has NTP sync disabled"
Include Time Drift in Health Checks
During any DB or PMM troubleshooting, always check time first.
A 10-second command can save hours of debugging.
Lessons Learned
- Symptom: PMM “No Data” for short time ranges
- Root Cause: Server’s clock 4 hours behind due to NTP desync
- Fix: Manual time correction + re-enable NTP
- Prevention: Standardize NTP and time checks across all servers
Final Checklist
| Command | Purpose |
| timedatectl status | Verify NTP and sync state |
| sudo timedatectl set-ntp true | Enable NTP |
| sudo hwclock –systohc | Sync hardware clock |
| date | Confirm correct time |
| systemctl restart pmm-agent | Restart PMM agent |
| PMM → Dashboard | Confirm data visibility |
Conclusion
Monitoring systems like PMM, Prometheus, and Grafana are incredibly powerful — but they rely on one universal truth: time must be right.
This blog is based on a real-world issue faced during MySQL infrastructure monitoring at scale. If you’re working with PMM, Orchestrator, or any Prometheus-based observability stack — remember that the smallest configuration (like a few minutes of clock drift) can have the biggest impact.