Replication is the ability of any database or data streaming software that allows it to maintain multiple copies of data or the server itself. The data flows from the Source (often referred as primary or master or writer etc) node to one or more targets (often referred as secondary or slaves or readers etc ). Replication is one the most important selling point for any software that deals with its customer data.
The philosophy behind the replication needs remains almost the same across all the different technologies platform however each technology has implemented it in its own way based on its needs and design. Fundamentals around how replication Iis implemented also remains similar to master(primary) server sending data to its transaction logs which indeed sends the logs to slave(replica) server which applies the changes received from master and maintains its state in sync with the primary. RDBMS technologies like mysql ( binary logs) , PostgreSQL ( wal logs ) and mongo ( oplogs ) etc. have implemented in similar way and technologies like ElasticSearch , Kafka has implemented in similar way. We will discuss each of these technologies separately in coming posts.
What is Replication ?
Software and hardware always comes with known and unknown Failure points which means failures are something that cannot be avoided for events like Infrastructure failures, software bugs, dbs/infra intruders , user accidental mistakes etc. so we have to design our systems always considering the fact that it is going to fail someday. However what can be achieved is a controlled response or behavior upon failure. Native database replication and clustering solutions with automated failover strategy can help us design systems closer to fault tolerant systems.
Why do we need Replication for ?
Replication offers a wide variety of options to implement scalability options in all database technologies. Below are some of the most commonly used purposes of replication. Each application/organization has its own needs to adopt replication however we have tried to list most commonly used usecases.
- Zero downtime Maintenance : For rebuilding tables, taking backups , defragmentations
- Minimize Downtime : To maintain a duplicate replicas to failover upon failures
- Delayed Slave: A delayed snapshot can be maintained to recover data corruption scenarios faster.
- Load balancing : To Split all read and write operations between slave and master server respectively. Or even better used for moving unnecessary traffic to slave nodes rather than on live primary(master) nodes.
- Backups : To be able to take a consistent backup from a live replica without disrupting operations on the master node.
- Geolocation data merging : If you have data from different masters from different locations to consolidate them into a single replica/slave for data analytics and reporting.
We have covered MySQL replication in detail this blog post . We will be covering replication implementation in all the other technologies we support as part of the same series in future posts. Applications hosted on standalone hosts are at most risk. Please reach out to us @ firstname.lastname@example.org if you need help with improving your database systems. We support a wide range of database and data streaming softwares visit us genexdbs.com to learn more about us.