MongoDB Indexing: Why It’s Never Just an Index

If you’ve ever had someone say, “The query is slow, just add an index,” this post is for you.In my recent MongoDB indexing session, we didn’t talk about syntax or fancy commands. We talked about something more uncomfortable and more important: what an index does to your system over time. Not just today’s slow query, but your long‑term workload, your writes, your replication, and your future self who has to keep this thing alive.

Indexes Don’t Sit Quietly in the Corner

Let’s start with a mindset shift.

Most of us learn indexes as “the thing that makes queries fast.” That’s not wrong, but in production it’s incomplete.

Every time your application:

Inserts a document
Updates a field that’s indexed
Deletes a document

MongoDB has to update all relevant index entries for that document.

So each new index is not a passive object. It’s a permanent increase in work for writes:

More keys to maintain
More structures to keep in sync
More data to store and read from disk.

Individually, that extra work looks small. But as data grows and write volume increases, it doesn’t grow linearly — it compounds.

That’s why I say: when you add an index, you’re not just optimizing a query. You’re changing the long‑term behavior of the system.

Index Creation: It’s a Workload Event, Not a Button Click

We also tend to treat “create index” as a harmless one‑time action.

Yes, MongoDB builds indexes online. Reads and writes continue while the index is being created. That’s good. But “online” does not mean “invisible.”

While an index build is running, MongoDB:

Scans the existing collection
Extracts index keys
Organizes them into index structures
Writes them to disk on the primary
Replays that work on secondaries as the operation replicates.

During that time, what happens?

CPU rises
Memory activity changes
Disk I/O goes up

If your system already runs close to capacity, that “temporary extra work” is exactly what exposes hidden bottlenecks.

So I don’t want you to fear index creation. I want you to respect it. Treat it like a workload event: plan it, schedule it, monitor it.

COLLSCAN vs IXSCAN: The Story Behind Those Two Words

Let’s talk about the classic explain‑plan story.

At a very high level, MongoDB can do two things for a query:

Scan the entire collection: COLLSCAN
Use an index: IXSCAN

In our lab, I created a simple dataset: about 500,000 “order” documents with fields like customerId, status, amount, city, orderDate — very typical transactional data.

Then I ran this query:

Find orders where customerId = 500 and status = “SHIPPED”.

With no index, explain showed a full collection scan. MongoDB examined all ~500,000 documents to return around 20.

Think about that for a second:

500,000 documents scanned
20 documents returned

That’s not “a slow query” problem. That’s workload design. Now imagine this same query running 200 times per second:

500,000 × 200 = 100 million document checks per second internally.

That’s where CPU climbs, memory pressure rises, and concurrency starts to suffer.

Then we created a compound index on { customerId, status } and ran the same query again.

This time, explain showed:

About 20 keys examined
About 20 documents examined

Same result. Dramatically less work.

That’s the real message:

Indexes don’t improve correctness. They reduce work per request.

And when you’re trying to scale, reducing work per request is almost everything.

The Question Is Not “Is the Index Used?” It’s “Is It Used Efficiently?”

Here’s a trap I see often: someone opens explain, sees IXSCAN, and relaxes. “Good, the index is being used.”

Not so fast.

What really matters in explain is:

totalKeysExamined
totalDocsExamined

Execution time will bounce around with cache effects, background load, and so on. But docs examined and keys examined tell you whether this plan will scale.

In our lab, we tested two indexes:

{ customerId, status }
{ status, customerId }

Both are compound indexes. Both can be used by the same query. Both show IXSCAN in explain.

But behavior is not identical.

Why? Because field order matters.

customerId had high selectivity (many different customers).
status had only a few possible values.

When status came first, MongoDB had to scan a much larger portion of the index before narrowing down to a specific customer.

When customerId came first, the dataset shrank quickly.

At small scale, the difference might look minor. But at tens or hundreds of millions of documents, that early reduction translates directly into CPU saved and latency avoided.

So the real DBA question is never just:

“Is the index used?”

It’s:

“How many keys and documents does this plan examine, and will that still be okay when my data is 10× bigger?”

Sorting: The Silent Memory and Latency Killer

Sorting is another place where inefficiency hides quietly until traffic spikes.

If your query needs sorting and you don’t have a supporting index, MongoDB:

Collects the matching documents
Sorts them in memory
And, if the data set is big enough, may spill to disk

Under light load, you may never notice. Under concurrency, this is where memory gets tight and latency becomes unpredictable.

We tested a query like:

Find all orders where status = “SHIPPED” and sort by orderDate descending.

Without a proper index, explain showed:

A COLLSCAN
Followed by a separate SORT stage
~500,000 documents scanned
~160,000 returned and sorted in memory

Now imagine that with multiple users, large result sets, and ongoing writes. If the sort needs more memory than allowed, it spills to disk — and that’s when latency spikes suddenly.

Then we created an index:

{ status: 1, orderDate: -1 }

Same query, new explain:

IXSCAN
No extra SORT stage

The result set is the same. The system behavior is not. MongoDB now reads results in the correct order directly from the index, without an in‑memory sort.

A good mental model for sorted queries:

Put equality conditions first in the index
Then the sort field
Range fields come later

It’s not a rigid law, but it’s an excellent default to start from.

Covered Queries: When MongoDB Never Touches the Document

One of my favorite concepts is the covered query — and it’s surprisingly underused.

A covered query is simple in idea:

MongoDB answers the query using only the index, without reading the actual documents.

No FETCH stage. No document I/O. Just index navigation.

When a query is not covered, MongoDB:

Reads index entries
Then FETCHes documents
Pulls data pages into memory

That adds disk I/O and cache pressure, especially under high read concurrency.

When a query is covered:

Results come straight from index pages
No document fetch
Much higher concurrency potential

In our lab we ran a query that filtered by customerId but only needed customerId and status in the result. Both fields were inside the compound index.

Explain showed:

PROJECTION_COVERED
totalDocsExamined = 0

MongoDB never touched the underlying documents. That’s what you want for very hot, read‑heavy endpoints.

But there’s a trade‑off:

Making indexes “wider” (more fields) increases index size and write cost.

So covered queries should be intentional, not the default for everything. Use them where read traffic is critical enough to justify the extra write overhead.

Before You Create an Index: The DBA Questions That Matter

By this point in the webinar, we’d seen a lot of behavior. The next question is: how do you decide whether an index should exist at all?

In production, index creation shouldn’t start with:

“Let’s add an index and see.”

It should start with questions.

Here are the ones I always ask myself.

1. Is this query pattern permanent or temporary?
Many performance complaints come from:

One‑off reports
Data migrations
Ad‑hoc admin scripts

If you create indexes for temporary workloads, you carry their cost forever, long after the workload disappears.

Use:

Query logs
Slow query history
Frequency over time

to decide whether this query deserves a permanent index.

2. Is the filter actually selective?
Not every field in a WHERE clause deserves an index.

If an index only reduces your dataset from 100 million documents to 60 million, it technically “works,” but operationally it’s weak.

Explain can tell you:

Keys examined
How much reduction you really get

3. Does this index also support the sort?
If the index helps the filter but not the sort, MongoDB still has to do a blocking sort.

So ask:

Does field order align with the query’s filter and sort?
Does explain still show a SORT stage? If yes, you’re not done.

4. What will this index do to writes?
Don’t skip this.

Is the collection write‑heavy?
Are you close to your write latency SLOs?
Will this index be updated frequently as part of your main write path?

Every index is a write tax. Make sure the business value of the read justifies that tax.

5. If this turns out to be a mistake, how do we undo it?
Indexes should be reversible decisions.

Before creating one, you should already know:

How to remove it safely
Which queries depend on it
When it can be dropped without causing outages

If you can’t answer these yet, the right DBA answer is not “yes” or “no.” It’s:

“Not yet.”

Different Index Types, Different Behaviors

MongoDB doesn’t give you just one type of index and wish you luck. It gives you several — each solving a problem, each changing behavior in a specific way.

Here’s how I like to “talk” about them, without going into syntax.

Compound index
Index more than one field together — for example, customerId + status for queries that always filter on both.
They let MongoDB narrow down the result set faster, but every extra field makes each index entry wider and writes heavier.
Partial index
“Only index the documents I really care about.”
Example: only index users where active = true.
Smaller, cheaper, great when your queries always include that condition — but if someone removes that filter in application code, the index may stop being used.
Sparse index
Index only documents where the field exists. Good when only some documents carry that field (like referralCode).
But queries that expect documents without the field will not see them through this index.[
TTL index
“Auto‑expire this data after X time.” Perfect for sessions, tokens, temporary logs.
Under the hood, MongoDB periodically deletes expired documents — which is real write work, especially if many expire at once.
Multikey index
Index arrays. A document with an array of 10 values becomes 10 index entries.
That’s powerful but dangerous when arrays are unbounded — index size can grow much faster than document count.
Wildcard index
Great when your schema has dynamic fields and you don’t know in advance what will be queried.
But if you’re not careful, you end up indexing fields nobody ever queries, paying the cost for no benefit.
Columnstore index
Designed for analytics. MongoDB can read specific fields very efficiently across many documents.
Perfect for reporting and aggregations, not for high‑frequency point lookups in OLTP paths.
Vector / search‑style indexes (Atlas)
Built for similarity search, recommendations, and AI embedding workloads.
They use different structures and should be treated as infrastructure decisions, not “quick fixes” for slow filters.

The big idea here:Every index type solves a real problem.
Every index type also introduces a long‑term behavior pattern you must understand.

After the Index Is Created: Governance, Not Forgetfulness

Here’s the part almost nobody talks about: what happens after the index exists.

In many teams, index creation is treated as the finish line:

Slow query
Add index
Query fast
Move on

From a DBA perspective, that’s actually where responsibility begins.

After an index goes live, I want to know three things.

1. Is the planner still using it the way we expected?
As data distribution changes, MongoDB may change its mind and pick a different plan.

So DBAs watch:

Index usage over time
Keys examined vs documents returned
Plan stability as data grows

2. What did this index do to write behavior?
The write impact is often subtle:

p99 write latency slightly higher
More queueing
Slower replication apply rate on secondaries

If you’re not looking, you’ll blame “the network” or “the hardware” instead of the new index.

3. How does this index affect replication?
Secondaries have to do the same index maintenance when applying oplog entries. If they were already close to their limits, a new index can tip them over.

That’s why I see index governance as “ongoing system care,” not “one‑time tuning.”

Index Cleanup: Not Cleaning Up Is the Real Risk

Over the years, indexes tend to accumulate:

A feature was added, an index was created, the feature later died.
A migration needed a temporary query, someone indexed it “just in case.”
A couple of developers fixed separate issues with overlapping indexes.

Six months later, no one remembers why all of them exist. But your writes are paying for every single one.

From a DBA point of view, what’s risky is not cleaning up. It’s letting unused and overlapping indexes live forever.

Typical cleanup actions:

Unused indexes
If an index shows no meaningful usage for a long period, and you’ve verified no critical query depends on it, it’s a candidate to drop.
Overlapping indexes
Multiple indexes serving similar filters can often be replaced with one well‑designed compound index.
Redundant single‑field indexes
Once you introduce a compound index, some single‑field indexes become redundant and can be removed if they don’t add independent value.

And here’s a safe pattern I really like:

First, hide the index (hidden index).
Observe behavior and usage via $indexStats.
If everything looks good, then drop it.

That way, cleanup becomes a controlled, reversible activity, not a leap of faith.

Bringing It All Together: Three Principles I Want You to Remember

If you take nothing else from this blog, I’d like you to carry these three ideas with you.

1. Evidence over assumptions
Guessing is expensive in production.

Look at explain plans.
Look at totalDocsExamined and totalKeysExamined.
Look at whether the planner actually prefers your index.

Let data, not intuition, drive your indexing decisions.

2. Decisions over syntax
Knowing how to write createIndex() is the easy part.

The real questions are:

Should this index exist?
Does it support a stable, long‑term query pattern?
What write and storage trade‑offs are we accepting?

Every index is a trade‑off: read speed vs write cost, reduction vs maintenance.

3. Governance over tuning tricks
A fast query today is not a finish line.

Workloads change. Features evolve. Data grows.

Indexes that made perfect sense last year might be dead weight today. That’s why governance matters:

Review index usage
Clean up unused and overlapping ones
Validate changes safely, especially in replica sets and sharded clusters

Indexing is not a “one‑time optimization.” It’s ongoing system care.

If this resonated with you, here’s a small suggestion: the next time someone says “just add an index,” pause for a second and ask:

“What long‑term behavior am I introducing into this system by doing that?”

If that question becomes a habit, your MongoDB clusters — and your future on‑call self — will thank you.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

MongoDB Indexing: Why It’s Never Just an Index

Indexes Don’t Sit Quietly in the Corner

Index Creation: It’s a Workload Event, Not a Button Click

COLLSCAN vs IXSCAN: The Story Behind Those Two Words

The Question Is Not “Is the Index Used?” It’s “Is It Used Efficiently?”

Sorting: The Silent Memory and Latency Killer

Covered Queries: When MongoDB Never Touches the Document

Before You Create an Index: The DBA Questions That Matter

Different Index Types, Different Behaviors

After the Index Is Created: Governance, Not Forgetfulness

Index Cleanup: Not Cleaning Up Is the Real Risk

Bringing It All Together: Three Principles I Want You to Remember

Like this:

Related

Trackbacks/Pingbacks

Leave a ReplyCancel reply

Latest to read

EXPERT DATABASE SUPPORT PARTNER

MongoDB Indexing: Why It’s Never Just an Index

Indexes Don’t Sit Quietly in the Corner

Index Creation: It’s a Workload Event, Not a Button Click

COLLSCAN vs IXSCAN: The Story Behind Those Two Words

The Question Is Not “Is the Index Used?” It’s “Is It Used Efficiently?”

Sorting: The Silent Memory and Latency Killer

Covered Queries: When MongoDB Never Touches the Document

Before You Create an Index: The DBA Questions That Matter

Different Index Types, Different Behaviors

After the Index Is Created: Governance, Not Forgetfulness

Index Cleanup: Not Cleaning Up Is the Real Risk

Bringing It All Together: Three Principles I Want You to Remember

Share this:

Like this:

Related

Trackbacks/Pingbacks

Leave a ReplyCancel reply

Latest to read

Discover more from Genexdbs