Managing Storage Growth in MongoDB: Fragmentation and Compaction

If you’ve ever deleted millions of documents from MongoDB and then checked your disk usage, you’ve probably asked yourself:

“Why hasn’t the database size decreased?”

This is one of the most common questions among developers, DBAs, and operations teams. At first glance, it may seem like MongoDB is holding on to storage unnecessarily. However, the reality is more nuanced.

MongoDB is designed to prioritize performance and efficient storage reuse rather than constantly shrinking and expanding data files. Understanding how MongoDB handles fragmentation, free space management, and compaction is essential for maintaining healthy production environments.

In this blog, we’ll explore how fragmentation occurs, how the WiredTiger storage engine manages free space, the role of index fragmentation, when compaction makes sense, and what DBAs should consider before taking action.

Understanding Fragmentation in MongoDB

Think of your database storage like a parking lot.

Initially, every parking space is occupied by cars (your documents). Over time, some cars leave, creating empty spaces throughout the lot.

New cars arrive, but they may not fit perfectly into the available spaces. Some may need larger spots, while others fit into the gaps left behind.

This is essentially what fragmentation looks like in a database.

As documents are inserted, updated, and deleted, free spaces begin appearing throughout MongoDB’s storage files. Over time, these spaces can become scattered, resulting in fragmentation.

MongoDB experiences two primary types of fragmentation:

Collection Data Fragmentation

Collection fragmentation occurs when document storage becomes inefficient due to:

Frequent updates that increase document size
Large delete operations
Continuous insert-delete workloads
TTL-based document expiration

Consider a simple document:

			
{
 name: "John",
  email: "john@example.com"
}

Later, the application evolves and stores additional information:

			
{
  name: "John",
  email: "john@example.com",
  preferences: {...},
  orderHistory: [...],
  addresses: [...]
}

		

If the updated document no longer fits in its original location, MongoDB may relocate it elsewhere. The original location becomes available for reuse.

After thousands or millions of similar updates, free space becomes distributed throughout the collection’s storage structures.

Index Fragmentation

While collection fragmentation often receives the most attention, index fragmentation can have an even greater impact on performance.

Indexes experience fragmentation due to:

Frequent inserts and deletes
Updates to indexed fields
Random key insert patterns
High-volume TTL deletions
Heavy write workloads

Potential consequences include:

Larger index sizes
Reduced cache efficiency
Increased disk I/O
Longer query execution times

For many production workloads, index fragmentation can affect query performance more significantly than collection fragmentation. Therefore, DBAs should evaluate both data and index storage utilization when assessing fragmentation.

How WiredTiger Handles Fragmentation

MongoDB uses the WiredTiger storage engine by default.

WiredTiger employs a page-based architecture that manages storage efficiently while minimizing operational overhead.

When documents are deleted:

The associated storage pages are marked as reusable.
Space becomes available for future inserts and updates.
Data files generally do not shrink immediately.

For example:

Metric	Value
Original Collection Size	100 GB
Data Deleted	30 GB
Disk File Size	~100 GB

At first glance, it may appear that 30 GB has been wasted. In reality, MongoDB can reuse that space whenever new data arrives.

This design improves performance because the database does not need to repeatedly allocate and release storage at the operating system level.

A Common Misconception About WiredTiger

Many people assume WiredTiger automatically defragments storage in the background.

This is not true.

WiredTiger efficiently reuses free pages, but it does not continuously reorganize data files to eliminate fragmentation.

As a result:

Internal free space may increase.
File sizes may remain unchanged.
Storage remains available for future growth.

Understanding this distinction is critical when evaluating storage metrics.

Measuring Fragmentation

One of the first places DBAs look is collection statistics:

db.collection.stats()

Example:

db.orders.stats()

Typical output includes:

			
{
  "size": 40GB,
  "storageSize": 70GB,
  "freeStorageSize": 25GB,
  "totalIndexSize": 12GB
}

		

At first glance, a large difference between size and storageSize may suggest fragmentation.

However, this comparison alone is insufficient.

The difference may simply reflect:

Recently deleted data
Reserved growth capacity
Normal WiredTiger free-space reuse
Temporary workload fluctuations

DBAs should also evaluate:

Collection growth trends
Insert/delete ratios
Query performance metrics
Cache utilization
Disk utilization
Future growth projections

The presence of free space alone does not necessarily indicate a problem.

Monitoring Index Health

Index statistics should be monitored alongside collection statistics.

Useful commands include:

db.collection.stats({ indexDetails: true })

Key metrics:

total IndexSize
indexSizes
Cache hit ratios
Query execution statistics
Working set efficiency

What Is Compaction?

Compaction is MongoDB’s mechanism for reorganizing storage to improve space utilization and reduce internal fragmentation.

Think of it like reorganizing a warehouse.

Instead of having empty shelves scattered throughout the facility, inventory is rearranged to make better use of available space.

MongoDB provides the compact command:

			
db.runCommand({
   compact: "orders"
})

The command rewrites collection data and attempts to optimize page layouts.

Potential benefits include:

Improved storage organization
Better page utilization
Reduced internal fragmentation
More efficient storage structures

Does Compaction Reduce Disk Usage?

This is one of the most frequently misunderstood aspects of MongoDB.

The answer is:

Sometimes, but not always.

Many administrators expect compaction to function as a “shrink database” operation.

In reality, compaction primarily reorganizes storage internally.

Whether physical disk space is returned to the operating system depends on:

MongoDB version
WiredTiger behavior
Filesystem characteristics
Available free pages

In some environments, disk usage may decrease noticeably. In others, the primary benefit may be improved storage efficiency rather than a smaller file size.

Operational Considerations Before Running Compaction

Compaction should never be treated as routine maintenance.

Before executing it in production, DBAs should carefully evaluate the operational impact.

Additional Disk Requirements

Compaction may temporarily require extra disk space during processing.

Always verify sufficient free storage is available before starting the operation.

Long Execution Times

Execution time depends on:

Collection size
Storage performance
Hardware configuration
Current workload

Large collections can take hours to compact.

Increased Resource Consumption

Compaction generates additional:

Disk I/O
CPU utilization
Storage activity

This may result in increased latency and slower application performance.

For production environments, compaction is generally scheduled during maintenance windows.

Alternatives to Compaction

In many cases, organizations choose alternative approaches for reclaiming storage.

Replica Set Resynchronization

A common strategy involves:

Adding a new secondary.
Allowing initial sync to complete.
Verifying replication health.
Rebuilding existing members sequentially.

Benefits:

Freshly built data files
Reduced fragmentation
Potentially greater storage savings

Operational considerations:

Increased network traffic
Extended synchronization times
Additional storage requirements
Replica set management complexity

Backup and Restore

Logical backup and restore can also reorganize storage.

Tools:

mongodump

mongorestore

Advantages:

Rebuilds storage structures
Recreates indexes
Produces optimized file layouts

Challenges:

Long execution times
Potential downtime
Additional infrastructure requirements

This approach is often more suitable during migrations or major maintenance events.

Fragmentation in Sharded Clusters

Storage analysis becomes more complex in sharded deployments.

DBAs should evaluate:

Fragmentation on individual shards
Chunk distribution
Balancer activity
Data growth trends across shards

A storage issue affecting one shard may not exist across the entire cluster.

Maintenance decisions should therefore be made at the shard level whenever possible.

TTL Workloads and Fragmentation

TTL indexes are commonly used for logs, session data, telemetry, and event records.

Example:

			
db.logs.createIndex(
   { createdAt: 1 },
   { expireAfterSeconds: 86400 }
)

Because TTL expiration continuously removes documents, these workloads often generate substantial reusable free space.

However, this does not automatically mean fragmentation requires intervention.

In many cases, MongoDB efficiently reuses this space for incoming data.

Archival Strategies and Long-Term Storage Management

Rather than repeatedly compacting collections, many organizations implement archival strategies.

Common approaches include:

Archive collections
Time-series collections
Tiered storage architectures
Data lifecycle management policies

Benefits include:

Improved storage efficiency
Better query performance
Reduced maintenance overhead
Simplified capacity planning

For large deployments, archival strategies are often more effective than periodic compaction.

Capacity Planning Considerations

Fragmentation should always be evaluated within the broader context of capacity planning.

Important questions include:

Is storage nearing capacity limits?
Will free space be reused soon?
Is application growth predictable?
Are performance metrics being affected?

In many environments, retaining reusable free space is actually beneficial because it supports future growth without requiring immediate storage allocation.

When Should DBAs Take Action?

Fragmentation alone is rarely a sufficient reason to intervene.

Action is typically justified when:

Disk capacity is approaching critical limits.
Significant storage reclamation is required.
Query performance degradation is observed.
Index growth becomes excessive.
Capacity planning objectives cannot be met.
Infrastructure costs justify optimization efforts.

If none of these conditions exist, allowing WiredTiger to reuse available space is often the most efficient strategy.

Best Practices for Minimizing Fragmentation

While fragmentation cannot be completely eliminated, its impact can be minimized.

Design Stable Document Structures

Avoid document patterns that cause frequent and significant document growth.

Monitor Both Data and Index Storage

Review collection statistics and index metrics regularly.

Evaluate TTL Workloads Carefully

Heavy TTL activity may generate large amounts of reusable space.

Implement Archival Strategies

Move inactive data to archive storage rather than keeping it indefinitely in primary collections.

Avoid Unnecessary Compaction

Compaction should be driven by measurable operational requirements, not by routine scheduling.

Incorporate Fragmentation into Capacity Planning

Storage management decisions should align with expected growth and business requirements.

Common Misconceptions About MongoDB Fragmentation

Myth 1: Deleted Data Immediately Shrinks Database Files

Not necessarily. MongoDB usually retains freed space for reuse.

Myth 2: Fragmentation Always Causes Performance Problems

Moderate fragmentation is often handled efficiently by WiredTiger and may have minimal performance impact.

Myth 3: Compaction Should Be Scheduled Regularly

Routine compaction is rarely necessary. Modern MongoDB storage management is designed to reuse space automatically.

Myth 4: Large Storage Files Mean Wasted Space

A large file size may simply indicate MongoDB is retaining reusable space for future growth.

Final Thoughts

Fragmentation is a natural consequence of any database system that experiences ongoing inserts, updates, and deletes. MongoDB’s WiredTiger storage engine is designed to manage this efficiently by reusing freed space rather than constantly resizing data files.

For DBAs, the key is understanding the difference between allocated storage, reusable free space, and true storage inefficiency. A large database file does not necessarily indicate wasted storage, nor does a gap between size and storageSize automatically imply fragmentation problems.

Before choosing to compact collections, rebuild indexes, perform replica set resynchronizations, or execute backup-and-restore operations, DBAs should carefully evaluate workload behavior, operational impact, capacity requirements, and future growth projections.

In most production environments, the goal is not to eliminate fragmentation entirely, but to understand it, monitor it effectively, and manage it intelligently as part of an overall storage and capacity-planning strategy.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.