Introduction — When Healthy Pods Still Cause Slow Databases
Modern database platforms are no longer limited to simple query processing. Systems built around MySQL, PostgreSQL, and MongoDB increasingly rely on background workers for replication, change data capture (CDC), asynchronous processing, and data synchronization.
While running these database services on Kubernetes, we observed an unexpected situation:
- Processing delays were increasing
- Event backlogs kept growing
- Autoscaling was enabled
- Pods appeared perfectly healthy
CPU utilization stayed around 30%, yet database lag continued rising.

Understanding Database Workloads
Before discussing autoscaling, it is important to understand how database workloads behave.
Unlike traditional web applications, database consumers are rarely CPU-intensive.
Typical database worker activities include:
- Reading replication logs
- Processing binlog/WAL events
- Waiting on disk operations
- Calling downstream APIs
- Performing asynchronous writes
These tasks are mostly I/O-bound, meaning the application spends more time waiting than computing.
Real Database Autoscaling Challenges
Modern database platforms introduce several scaling challenges that are often invisible at the infrastructure level.
Common scenarios include:
- replication workers waiting for disk synchronization
- slow downstream microservices delaying writes
- analytics pipelines consuming database change events
- batch updates triggered during peak business hours
In many cases, the database itself is not overloaded. Instead, the bottleneck exists in the processing layer consuming database events.
This creates a misleading situation where:
- CPU metrics appear healthy
- memory consumption remains stable
- yet user-facing latency increases
Autoscaling decisions based purely on infrastructure metrics fail to capture these operational realities.
CPU-Bound vs I/O-Bound Workloads
| CPU-BOUND APPLICATIONS | DATABASE PROCESSING WORKERS |
| Heavy computation | Waiting on I/O |
| High CPU usage | Low CPU usage |
| Easy autoscaling | Hard workload detection |
Traditional Autoscaling in Kubernetes
Kubernetes provides automatic scaling through the Horizontal Pod Autoscaler (HPA).
HPA adjusts pod counts based on infrastructure metrics such as:
- CPU utilization
- Memory consumption
Typical scaling logic:
If CPU usage increases → add podsIf CPU usage is normal → keep pods unchanged
This approach works extremely well for:
- Web APIs
- Stateless services
- Compute-heavy workloads
However, database event processors behave differently.
Why CPU Metrics Mislead Platform Teams
CPU utilization became the default autoscaling signal because early cloud applications were compute-heavy. However, modern distributed systems operate differently.
Database consumers typically perform:
- network-bound operations
- transactional commits
- external API calls
- disk synchronization
During these operations, containers remain mostly idle from a CPU perspective while still actively processing work.
As a result, CPU usage reflects resource consumption, not system pressure.
This mismatch leads to delayed scaling reactions and growing operational risk during traffic spikes.
Why Autoscaling Failed for Database Processing
In our database services environment:
- Event generation increased rapidly
- Replication/change events accumulated
- Processing delays increased
Yet Kubernetes did not scale additional pods.
Why?
Because CPU usage remained low.
The autoscaler assumed everything was functioning normally while the actual workload pressure existed inside the event backlog.
Root Cause
Autoscaling was tied to resource consumption, not workload demand.
Observability: The Missing Piece in Autoscaling
Successful autoscaling requires visibility into application behavior rather than infrastructure alone.
Database platforms benefit from monitoring signals such as:
- replication lag duration
- pending background jobs
- transaction commit latency
- change event throughput
These metrics reveal the true workload demand.
By integrating observability platforms such as Prometheus, teams can expose meaningful database metrics that represent real processing pressure.
Autoscaling then becomes proactive rather than reactive.
Event-Driven Autoscaling — A Better Approach

To solve this, we shifted from resource-based scaling to event-driven autoscaling using KEDA.
KEDA (Kubernetes Event-driven Autoscaling) is a lightweight, CNCF-graduated component that provides event-driven autoscaling for Kubernetes workloads. It enables applications to scale from zero to thousands of pods based on external metrics.
KEDA extends Kubernetes autoscaling by monitoring external events instead of CPU metrics.
Examples of scalable signals:
- Queue length
- Replication lag
- Pending database jobs
- Change stream events
Instead of asking “How busy is the CPU?”, we now ask:
“How much work is waiting to be processed?”
Implementation Approach
The implementation required minimal changes:
- Install KEDA in the cluster
- Define a scaling object connected to database metrics
Operational Benefits Beyond Scaling
Event-driven autoscaling introduced benefits beyond performance improvements.
Operational advantages included:
- reduced manual intervention during peak load
- predictable recovery from traffic bursts
- improved database stability due to faster event processing
- simplified capacity planning
Engineering teams no longer needed to guess peak resource requirements. The system dynamically adapted to workload demand, allowing platform teams to focus on reliability instead of reactive scaling actions.
Observed Improvements
After adopting event-driven autoscaling:
| Metric | Before | After |
| Processing Delay | High | Near real-time |
| Pod Count | Fixed | Dynamic |
| Resource Usage | Constant | Demand based |
| Idle Infrastructure | Always running | Scales to zero |
Database backlogs cleared faster, and processing became predictable even during traffic spikes.
Cost Optimization Benefits

Traditional autoscaling typically maintains minimum running pods regardless of workload activity.
Event-driven autoscaling introduced:
- Automatic scale-to-zero capability
- Reduced idle compute usage
- Improved node utilization
- Lower infrastructure costs
Autoscaling finally aligned with actual demand rather than static assumptions.
Engineering Lessons Learned
This experience highlighted several important platform engineering principles:
- Autoscaling must follow the system bottleneck
- CPU usage does not always represent real workload pressure
- Database systems are inherently event-driven
- Observability metrics are more valuable than raw resource metrics
When Database Autoscaling Should Be Avoided
While event-driven autoscaling is powerful, it is not universally applicable.
Database workloads that may not benefit include:
- long-running analytical queries
- stateful database engines themselves
- tightly coupled legacy applications
- workloads requiring strict connection limits
Autoscaling should primarily target database consumers and processing services, rather than the database engine itself.
Understanding this distinction prevents architectural complexity and ensures stable production environments.
Conclusion
As database platforms evolve toward distributed and event-driven architectures, traditional CPU-based autoscaling becomes insufficient.
By adopting event-driven autoscaling, database services running on Kubernetes become:
- More responsive
- More efficient
- More cost-effective
- Better prepared for modern workloads
Scaling should reflect work waiting to be processed, not just resources being consumed.