Introduction:

Modern infrastructure is built on containers and Kubernetes, but databases still remain the hardest components to operate reliably. As Site Reliability Engineers (SREs), our responsibility is not just to deploy MySQL, but to ensure it remains available, consistent, observable, and recoverable under failure.

Running MySQL on Docker and Kubernetes is no longer a theoretical exercise it is happening in production across startups and enterprises. However, MySQL is a stateful system, while Kubernetes is fundamentally designed for stateless workloads. This mismatch introduces operational risks that SREs must understand deeply.

This blog focuses on how MySQL behaves when deployed using Docker and Kubernetes, what architectural patterns work, what fails in production, and what SREs must do to keep MySQL reliable in containerized environments.

Why MySQL is different from Typical Container Workloads:

Containers are designed to be:

  • Ephemeral
  • Easily replaceable
  • Horizontally scalable

MySQL, on the other hand, expects:

  • Stable disk storage
  • Predictable startup and shutdown
  • Low-latency I/O
  • Strong data consistency

From an SRE perspective, MySQL inside containers must be treated as stateful infrastructure, not just another microservice.

MySQL in Docker: Understanding the Basics

Running MySQL in Docker is often the first step before Kubernetes.

Basic MySQL Docker Run

docker run -d \
  --name mysql \
  -e MYSQL_ROOT_PASSWORD=secret \
  -v mysql_data:/var/lib/mysql \
  mysql:8.0

What an SRE Learns from This Setup

  • Containers can restart, but data must persist
  • Volumes are mandatory
  • Logs are limited to STDOUT
  • Resource isolation is controlled by Docker limits

SRE Rule:

Never run MySQL in Docker without persistent volumes.

Container Resource Limits and MySQL

Docker and Kubernetes enforce CPU and memory limits. MySQL is unaware of these limits unless explicitly tuned.

Common Failure Pattern

  • Pod hits memory limit
  • Kernel OOM kills MySQL
  • Kubernetes restarts the pod
  • Crash recovery increases startup time

SRE Mitigation

  • Set explicit memory limits
  • Tune innodb_buffer_pool_size
  • Avoid overcommitting memory
innodb_buffer_pool_size=70% of container memory

Why Kubernetes Makes MySQL More Challenging

Kubernetes provides:

  • Auto-restarts
  • Self-healing
  • Scheduling flexibility

But MySQL requires:

  • Stable network identity
  • Ordered startup
  • Safe shutdown
  • Disk affinity

This is why Deployments are unsuitable and StatefulSets are required.

MySQL on Kubernetes Using StatefulSets

Why StatefulSet Is Mandatory

FeatureDeploymentStatefulSet
Pod identityRandomStable
StorageSharedDedicated PVC
Startup orderParallelOrdered
DNS nameUnstablePredictable

StatefulSets allow MySQL pods to maintain identity across restarts.

Minimal MySQL StatefulSet Example

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: secret
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 20Gi

Storage: The Most Critical Component

Most MySQL incidents on Kubernetes are storage-related, not CPU-related.

Storage Options Evaluation

Storage TypeSRE Verdict
NFSUnsafe
HostPathNode-dependent
Cloud Block StorageRecommended
Local SSDRisky

SRE Rule:

If storage latency spikes, MySQL performance collapses.

Readiness and Liveness Probes

Readiness Probe (Safe)

readinessProbe:
  exec:
    command:
    - mysqladmin
    - ping
    - "-psecret"
  initialDelaySeconds: 30

Readiness ensures traffic is only sent when MySQL is actually ready.

Liveness Probe (Dangerous if Misused)

Aggressive liveness probes can:

  • Kill MySQL during crash recovery
  • Cause data corruption
  • Increase downtime

MySQL Replication in Kubernetes

MySQL replication is not Kubernetes-aware.

Typical setup:

  • One primary pod
  • Multiple replica pods
  • Manual or external failover logic

Challenges

IssueImpact
Pod restartReplica breakage
Primary cashManual promotion
Split brainData inconsistency

Kubernetes does not manage MySQL leadership.

Operators: Production-Grade MySQL on Kubernetes

Operators automate:

  • Replication setup
  • Leader election
  • Backup and restore
  • Scaling

Popular operators:

  • Percona XtraDB Operator
  • Oracle MySQL Operator
  • Vitess (large-scale)

SRE Recommendation:

Do not run self-managed MySQL on Kubernetes in production without an Operator.

Backups and Disaster Recovery

Backup Strategies

  • Logical backups (mysqldump)
  • Physical backups
  • Volume snapshots

SRE Reality

  • Backups without restore tests are useless
  • Snapshot restore time matters more than backup speed
  • Backups must be isolated from cluster failure

Monitoring MySQL in Kubernetes

Key Metrics

  • Replication lag
  • Disk I/O latency
  • Pod restarts
  • Query latency
  • Buffer pool usage

Common Tools

  • mysqld_exporter
  • Prometheus
  • Grafana

SRE Insight:

Pod restarts indicate platform issues, not database tuning problems.

Failure Scenarios Every SRE Should Test

Scenario 1: Pod Restart

  • Crash recovery duration
  • Readiness delay behavior

Scenario 2: Node Failure

  • PVC reattachment time
  • Data integrity verification

Scenario 3: Disk Latency Spike

  • Query timeout behavior
  • Thread pile-ups

When MySQL Should NOT Run on Kubernetes

Avoid Kubernetes MySQL if:

  • Ultra-low latency is required
  • Database size is massive
  • Team lacks DB SRE maturity

Sometimes the best solution is:

Managed MySQL > Kubernetes MySQL

Conclusion

Running MySQL on Docker and Kubernetes is not impossible, but it is operationally expensive.

For SREs, success depends on:

  • Storage reliability
  • Safe restarts
  • Tested backups
  • Controlled failover

Kubernetes does not magically solve database reliability. It only amplifies mistakes.

Key Rule:

Treat MySQL as critical state, not just another container.

Discover more from Genexdbs

Subscribe now to keep reading and get access to the full archive.

Continue reading