In the dynamic realm of MongoDB clusters, optimizing index creation processes is crucial for maintaining database performance and efficiency. Recently, we encountered an issue on a production MongoDB cluster, where prolonged index creation times during high CPU utilization posed a significant challenge. Real-time use cases often provide valuable learning experiences, so I’d like to share our experience and the solution we implemented. This may be useful to anyone facing similar issues.

Use Case – Brief About the Issue:

We managed a MongoDB cluster (Version 5.0) with one primary node and eight secondary nodes. Our customer noticed that creating indexes took an unusually long time and was inconsistently applied, with CPU utilization consistently peaking between 90% to 100%. Despite exhaustive checks of logs and attempts to execute index creation queries, the issue persisted.

The customer, eager for a solution, temporarily halted their pre-production application to reduce strain. However, this intervention failed to yield the desired results. We discovered that long-running queries lacking proper indexes were causing high CPU utilization. Thus, creating the missing indexes was the logical next step. However, this seemingly straightforward task proved more challenging than anticipated.

Possible Approaches to Fix the Issue:

To address the issue, we proposed two potential solutions:

1.Rolling Index Builds: Taking a secondary node out from the replica set cluster, making it standalone, applying the indexes, and then reintegrating it into the cluster. Follow the same for all the members in the replica set.

2. Index Creation During Non-Peak Hours: Create the indexes on the primary node during periods of reduced workload, such as non-peak hours.

Solution Followed:

The first option, though theoretically viable, presented logistical hurdles due to the cluster’s complexity and the time-intensive nature of the process. Therefore, we decided to explore the second option first, given its simplicity and potential for immediate implementation.

Upon further examination, a crucial detail emerged: the index creation queries were configured with background = trueoptions. We embarked on an experiment: executing the query without the background option. To our surprise, the query executed immediately, without any noticeable delay. So all the index creation queries were executed without that option and we were able to create indexes immediately. After index creation, we saw an improvement in CPU utilization.

Below Graph shows the variation in CPU utilization before and after index creation,

Sample Queries used in the Activity:

To select Database:

PRIMARY> use database_name
switched to db database_name

Index Creation Query:
PRIMARY> db.runCommand(
...   {createIndexes : "Collection_name",
...   indexes : [{ "name" : "state", "key" : { "state" : 1}},
...   { "name" : "categoriesId", "key" : { "categoriesId" : 1}},
...   { "name" : "mediaType", "key" : { "mediaType" : 1}},
...   { "name" : "createdAt", "key" : { "createdAt" : 1}},
...   { "name" : "updatedAt", "key" : { "updatedAt" : 1}},
...   { "name" : "id", "key" : { "id" : 1}}, { "name" : "mediaAssetId",
...   "key" : { "mediaAssetId" : 1}}, { "name" : "title",
...   "key" : { "title" : 1}}, { "name" : "description",
...   "key" : { "description" : 1}}, { "name" : "currentStepName",
...   "key" : { "lastWorkflows.workflowEngineObject.currentStepName" : 1}},
...   { "name" : "isPristine", "key" : { "lastWorkflows.workflowEngineObject.data.isPristine" : 1}},
...   { "name" : "allCurrentStepNames",
...   "key" : { "lastWorkflows.allCurrentStepNames" : 1}},
...   { "name" : "metadataBusiness", "key" : { "metadataBusiness.$**" : 1}},
...   { "name" : "metadataTechnicals", "key" : { "metadataTechnicals.$**" : 1}},
...   { "name" : "comments", "key" : { "comments.$**" : 1}},
...   { "name" : "visibilityRules", "key" : { "visibilityRules.$**" : 1}},
...   { "name" : "content", "key" : { "content" : "text" }},
...   { "name" : "tags", "key" : { "tags" : 1}}],
...   commitQuorum : "majority" })

Output:
{
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 19,
    "createdCollectionAutomatically" : true,
    "commitQuorum" : "majority",
    "ok" : 1,
    "$clusterTime" : {
   	 "clusterTime" : Timestamp(1710957102, 2),
   	 "signature" : {
   		 "hash" : BinData(0,"5XflscqPgbNU3Y3cOZLS64Tcq0c="),
   		 "keyId" : NumberLong("7293623319293591556")
   	 }
    },
    "operationTime" : Timestamp(1710957101, 19)
}

To check the created indexes count of each collection:
PRIMARY> db.getCollectionNames().forEach(function(collectionName)
 {
var indexCount = db.getCollection(collectionName).getIndexes().length;
print("Collection: " + collectionName + ", Index Count: " + indexCount);
})

Connect to a secondary node and list indexes for the collection to verify if the newly created indexes are reflected:

SECONDARY> rs.slaveOk()
WARNING: slaveOk() is deprecated and may be removed in the next major release. Please use secondaryOk() instead.
This command in MongoDB allows read operations to be performed on secondary nodes in a replica set.
To check the created indexes count, use the same query as used above in primary.

Investigating the Unexpected Behavior

As we known that Before version 4.2, MongoDB supported building indexes either in the foreground or background. Foreground builds held an exclusive lock for the entire process, which meant that no other operations could be performed on the database until the index build was complete. This made foreground builds fast but highly disruptive. On the other hand, background builds allowed read-write access during the build process by not holding an exclusive lock. This resulted in slower builds but minimized disruption to database operations

From version 4.2 onwards, Index builds use an optimized build process that holds an exclusive lock on the collection at the beginning and end of the index build. MongoDB ignores the background index build option if specified to createIndexes or its shell helpers createIndex() and createIndexes().

To understand further on this unexpected behavior, based on our study, we presumed these possible reasons:

  1. Stable API Specification/Backward Compatibility Feature
  2. Potential Bug

Stable API Specification:

MongoDB 5.0 introduced the Stable API, allowing applications to specify the API version they run against, ensuring long-term stability. If the application did not specify the API version, it might default to an older behavior, inadvertently making the background option valid and influencing the index creation process. Ensuring that the application specifies the correct API version is crucial to avoid such issues.

Backward Compatibility Feature:

MongoDB supports backward compatibility feature. If the cluster was running in a mode that maintained backward compatibility features, the background option might still have had an impact, causing longer index creation times.

Potential Bug:

According to the official Documentation, When using Stable API V1: You cannot specify any of the following fields in the indexes array:

  • background
  • bucketSize
  • sparse
  • storage Engine

Starting in MongoDB 5.0, API V1 database commands raise an error if passed a parameter not explicitly accepted by the command. However, it did not throw any error when the background option was mentioned in the index query . This leads us to suspect that there could be a bug that causes the background option to be valid despite the official documentation stating otherwise. Bugs in software can lead to unexpected behavior, and it’s possible that a bug in the MongoDB version being used caused the background option to affect index creation times.

Since these reasons are speculative, we initiated a topic in the MongoDB Community Forum for discussions.

Conclusion:

By exploring these potential reasons, we can better understand why removing the background option helped optimize index creation during high CPU utilization. While the exact cause remains speculative, this investigation highlights the importance of understanding and correctly configuring database features and options. Keeping up with version changes and thoroughly testing new configurations can help prevent and resolve such issues effectively..

Discover more from Genexdbs

Subscribe now to keep reading and get access to the full archive.

Continue reading