In today’s world, keeping a copy of data is very important. Without a data backup, resuming operations can be challenging, and starting from scratch may result in data loss or prolonged downtime. So that you should keep a copy of the data. This process is also known as replication. In this blog, we will discuss replication  concepts on MongoDB servers.

What is MongoDB replication?

      MongoDB replication is a method for ensuring data redundancy and high availability by creating copies of the dataset across multiple MongoDB servers, typically configured as a replica set.

How Replication is works ?

            In MongoDB one of the node is considered the primary node that receives all the write operations.The other node are the secondary, it will replicate the data from the primary node.

 In an event where the primary node is unavailable , a secondary node will take up the primary node role.

Heartbeat

     It is the process that identifies the current status of MongoDB node in a replica set.The replica set nodes send pings to each other  every two seconds.If any node ping back within 10 seconds,the other nodes in the replica set mark it as inaccessible.

Working principle

  • Write operations on the primary: When a user sends a write operation (such as an insert, update, or delete) to the primary node, the primary node processes the operation and records it in its op-log (operations log).
  • Op-log replication to secondaries: Secondary nodes poll the primary’s op-log at regular intervals. The op-log contains a chronological record of all the write operations performed. The secondary nodes read the op-log entries and apply the same operations to their data sets in the same order they were executed on the primary node.
  • Achieving data consistency: Through this op-log-based replication, secondary nodes catch up with the primary’s node data over time. This process ensures that the data on secondary nodes remains consistent with the primary’s node data. 
  •  Read operations: While primary nodes handle write operations, both primary and secondary nodes can serve read operations which can help in load balancing. Clients can choose to read from secondary nodes, which helps distribute the read load balance and reduce the primary node’s workload. However, note that secondary node might have slightly outdated data due to replication lag.

Types of replications

  • Replica Sets : This is the primary replication mechanism in MongoDB,  a replica set consists of a group of MongoDB instances that maintain the same dataset.It has one primary node and multiple secondary nodes .If the primary node fails, an election takes place to promote one of the secondaries to primary.
  • Sharded Clusters with Replication : This is a combination of sharding and replica sets. Each shard in the cluster is a replica set.MongoDB splits data across multiple shards based on a shard key. Each shard consists of a replica set for redundancy and high availability.The main purpose of a cluster is to support extremely large data sets and high throughput operations by horizontally scaling the workload.

The major difference between a replica set and a shard is:

In a replica set, all nodes contain the full dataset for redundancy, while a sharded cluster distributes portions of data across shards to allow horizontal scaling and improved performance for large data sets.

Minimum requirements for replication

Replica Set : The minimum recommended configuration for a replica set is a three member replica set with three data-bearing members: one primary and two secondary members.A replica set can have up to 50 members but only 7 voting members.

Sharded Cluster: The minimum configuration for a sharded cluster is a one MongoDB instance acting as a query router, a single shard replica set (one primary and two secondary), and a replica set config server with a minimum 3 member replica set.

Steps for Configure Replication

1.Install the mongod 

2.Configure the  /etc/hosts and  /etc/mongod.conf

3.Configure the firewall

4.Configure MongoDB replica set

5.Initiate replication

6.Test the replication

For testing locally, I need two virtual machines (VM). So that first I try to create the two VM’s using the vagrant tool.

What is Vagrant ? 
Vagrant is a tool for building and managing virtual machine environments in a single workflow. With an easy-to-use workflow and focus on automation, Vagrant lowers development environment setup time, increases production parity.

vagrant box : Boxes are the package format for Vagrant environments.

1.Install the mongod

For testing purpose i choose the two Vagrant Boxes (Ubuntu 22.04 version) and install the MongoDB community edition 7.0 version.

Step 1:
Import the public key  used by the package management system

> sudo apt-get install gnupg curl

Import the  MongoDB public GPG key 

> curl -fsSL https://pgp.mongodb.com/server-7.0.asc | sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg  --dearmor

Step 2:

Create  a list file for mongodb under the etc/apt/sources.list.d/mongodb-org-7.0.list.

> echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list

Step 3 :

  Reload local package database.

> sudo apt-get update

Step 4 :

Install the latest version of  mongodb packages

> sudo apt-get install -y mongodb-org

To start  the MongoDB Service,

Start the Mongod service using  the systemctl command.

> systemctl start mongod

To verify the mongod process is running or not.

> systemctl status mongod

2. Configure the /etc /hosts and /etc /mongod.conf file

By default mongodb keeps the configuration file is in  /etc/mongod.conf. Open the the file using vi editor and change the bind address  using the bind_IP and enable the replication(remove the comment(#) key) and add a replica set name using the “replSetName” option.Here  added the replSetName like ‘rep’, see the below screenshot.

 > sudo vi  /etc/mongod.conf
replication:
  replSetName: "rep"

 Open the /etc/hosts file and add these lines and save it

> sudo vi /etc/hosts

 192.168.56.10 mongodb1
 192.168.56.20 mongodb2

3. Configure the firewall

 First we have to check the status of firewall,If it is disable  we can enable it  and allow the mongodb default port 27017.

> sudo ufw status
> sudo ufw enable
> sudo ufw allow 27017

Follow the above steps for setup the other node for replication.

4. Configure the replication set

We have to configure the replication set using the /etc/mongod.conf file.After that, restart the all MongoDB server and login to the primary node using the mongo shell utility.

> sudo systemctl restart mongod
> sudo mongosh 192.168.56.10

5.Initiate the replication

 To initiate the replication using the rs.initiate() method.

mongosh > rs.initiate()

After the initialization  we have to add the other nodes using the rs.add () method.

mongosh > rs.add(“ 192.168.56.20:27017”)    // Here pass the ip address and the port number.

 Replication status

                       To check the replication status using the rs.status() method.

mongosh > rs.status()

6.Testing the replication

Create a new database and check it with the secondary.
Primary:
Here create a new database ‘demo’ creating the use command.

mongosh > use demo

Secondary :

Successfully replicate the demo database on secondary node.

Create a new collection :
Create a new collection ‘test’ on primary.

   Primary 

mongosh > use demo
mongosh > db.createCollection("test")
mongosh > show collections


Secondary

Insert some data to the collection:

Primary

mongosh > db.test.insertOne({id:1,name:"sarath"})

Successfully inserted the data to the collection from primary and checked it with the secondary,But unfortunately got an error,

The error shows can’t set the read preference.

To Fix the error – Need to set the read  preference

mongosh > db.getMongo().setReadPref(‘secondary’)

Conclusion

MongoDB replication is a powerful feature that provides high availability, data redundancy, and load balancing for database read operations. In this blog, we explored the core concepts of MongoDB replication, its components like the op-log, heartbeat mechanism, and replication types such as replica sets and sharded clusters. I hope that it’s very useful for the beginners when configuring the replication. Stay tuned for more blogs.

Discover more from Genexdbs

Subscribe now to keep reading and get access to the full archive.

Continue reading