This article is about configuring Linux real time bidirectional data replication and file synchronization (filesync) in EDpCloud software
Introduction to Linux bidirectional file and data replication for backup and file mirroring
In this blog, we will learn how to configure EDpCloud software for Linux real time bidirectional data replication and to synchronize data back and forth between two Linux servers (For backup purposes, online file mirroring or for process automation). Whenever a file changes in one server, it also changes in the other server, hence mirroring files and the content of the two servers. Any changes that happen in one server will also happen on the other server and vice-versa.
Linux Replication Server
Our objective here is to backup one Linux server to another and vice-versa.
For example if you modify file “foo” on a server called “svr1”, file foo will also be modified on server “svr2” and vice-versa(by sending only the deltas or the part of the file that changed).
You can also do the same thing I describe here under windows but I will specifically discuss the Linux Bidirectional Replication here.
Without delay, lets jump in and see how we can configure Linux real time bidirectional replication and filesyc to mirror file systems on servers or virutal machines.
Please read this article first:
Requirements for Bidirectional Linux Data Replication and remote file mirroring
1. Download and install EDpCloud by visiting https://www.enduradata.com/downloads. If you have 64 bit Linux, make sure you download the 64 bit package, otherwise download the x86 package (32 bit).
2. Install EDpCloud replication software
Details about the installation can be found in the articles cited above.
3. Have your list of directories or files that you want to replicate handy: This is the list of directories that you want to monitor in real time for backup.
Configure Linux bidirectional data replication and file synchronization
1. cd $ED_BASE_DIR/etc
2. vi includes and add all the directories you want to include in replication
The includes and excludes files use regular expressions. For more information read the post on includes and excludes:
Edit data replication includes file
Your includes much have only the directories you want included. If you don’t have an include file, everything is included by default.
Add the following two lines using a text editor (or the browser based GUI). This will allow the servers to only include the files that match the patterns below in the bidirectional data replication. Make sure you do the same in both servers.
This will only replicate and mirror files that are under /data and the directory /data itself.
Edit Linux replication excludes
Again see the post about excluding files from replication. It is critical that you exclude all directories such as etc, var, etc. The default excludes has such a list. I cannot stress enough how important it is to use includes and excludes to avoid mirroring files in /etc such as the passwords, the host names and IP addresses. Make sure you use the includes to specify what to replicate.
EDpCloud requires fuse. You can find the sources under $ED_BASE_DIR/edpcloud/src
as root:[ Please replace the version numbers if needed ]
tar xvf ed_fuse-2.9.5.tar.gz
You can install fuse under /usr/local or under $ED_BASE_DIR/edpcloud
Now add /usr/local/lib
/etc/ld.so.conf (assuming you installed fuse there)
Edit eddist.cfg replication configuration
We will need to create two links (also called replication sets). One going from svr1 to svr2 (l1) and one going from svr2 to svr1 (2). File changes that take place on svr1 will be synchronized with svr2 using link (repset) l1. File changes that take place on svr2 will be synchronized with svr1 using link l2.
Here is the content of eddist.cfg
<?xml version="1.0" encoding="UTF-8"?> <config name="enduradata" password="Addoud4d4ch1n1gh4T4s4" workers="4"> <link name="l1" password="foo"> <sender hostname="svr1" /> <receiver hostname="svr2" storepath="/" /> </link> <link name="l2" password="foo"> <sender hostname="svr2" /> <receiver hostname="svr1" storepath="/" /> </link> </config>
Notice that the data replication configuration above will not propagate deletes between the two systems. To propagate deletes we need to add the keyword deletes=”1″ to eddist.cfg as the following example shows (But don’t do deletes=”1″ unless you know what you are doing).
<?xml version="1.0" encoding="UTF-8"?> <config name="enduradata" password="Addoud4d4ch1n1gh4T4s4" workers="4"> <link name="l1" password="foo"> <sender hostname="svr1" /> <receiver hostname="svr2" storepath="/" deletes="1"/> </link> <link name="l2" password="foo"> <sender hostname="svr2" /> <receiver hostname="svr1" storepath="/" deletes="1"/> </link> </config>
Now the directory /data will be mirrored between the two servers svr1 and svr2. Any changes made on svr1 are made on svr2 and any changes made on svr2 are made on svr1 using file synchronization. Please note that there is no distributed lock manager between the two servers and the last writer wins (Same behavior as with NFS).
Start EDpCloud for Linux replication services
Run the following command to start EDpCloud and you will be on your way to synchronize data between the two servers in real time thus ensuring that data is backed up between the two.
. $ED_BASE_DIR/bin/edpcloud.sh startall
Run edstat to make sure edpcloud is running before you proceed to the next step. Do not proceed unless edpcloud is running without the realtime configuration. If you run into problems check eddist.log to find out what is wrong with your data replication (Other logs can be found in $ED_BASE_DIR/logs)
Replicate some data by using:
edq -l linkname -r receivername -n dir_or_filename
run edstat again and verify that your files were replicated to the receiver side.
Do all the previous steps for all hosts in in eddist.cfg. Once scheduled and on demand replication and backup is working, proceed to configure the real time replication and synchronization.
Edit edfsmonitor.cfg Linux Real Time Replication Configuration
Add this line in the file and save it.
This tells EDpcloud to monitor /data for changes and queue them for replication to the remote server.
Now that real time replication is configured, you need to start the real time file system changes monitor:
Source in the environment (notice the dot):
if all worked you should see the directories in edfsmonitor.cfg mounted (use df to see what is mounted)
Your real time bidirectional replication is now configured and any files that change in directories, listed in edfsmonitor.cfg, on one system are replicated to the other remote system.
The following is a list of related articles you may be interested in:
- Data Migration in Healthcare
- File Synchronization and EFSS Software
- Data ingestion and data integration in real-time
- How to Sync files between Linux servers
- Ransomware revisited
- A Brief Introduction to Data Replication Software
- EnduraData’s presentation at Halicon2019: the MinneAnalytics Data Science & Emerging Tech
- Leveraging data for decision making: Bob’s way.
- Data migration from Windows to Linux
- EnduraData at Supercomputing 2018
Share this Post