EnduraData Linux File Replication Software

Linux File Replication Installation and Configuration

A. A. El Haddi Linux File Replication

1. Introduction to EDpCloud Linux File Replication Software

EDpcloud is a Linux file replication and file synchronization software solution. It syncs data in real-time, scheduled, or ad-hoc modes. It sends the portions of the files that are changed to one or more sites to create remote copies of the files.

EDpCloud is available for Windows and other platforms as well. With EDpCloud, you can replicate data between different platforms and operating systems over local area networks or between remote networks, clouds, etc. Hence EDpCloud can be used for data protection, file sharing, file mirroring and more.

EnduraData File synchronisation and replication move data between servers, virtual machines, or containers located on the same network, different networks, or multiple clouds. The solution replicates file changes known as deltas to synchronize data between one or more sites.

EDpCloud can replicate NFS, NTFS, CIFS, and many other file systems. It copies data and associated metadata (permissions, ownership, group, acls, etc) to the remote site.

EDpCloud supports many topologies: one-to-one, many-to-one, bidirectional, etc.

Although EDpCloud is available as a Linux container, this post will cover file replication installation and configuration for virtual or physical machines.

2. Requirements For EnduraData Linux File Replication Software

The following are the minimum requirements:

  • A virtual machine or a physical machine running Linux (You can also use a container, but that is a different blog post)
  • A minimum of 8 GB of free RAM (Or higher if you use more than two parallel data stream workers).
  • A minimum of 12 GB of free disk space is required for logs and journals ( More if you keep logs around for a long time or are queuing a massive number of files for replication). Note that the more files you queue and records you save, the more disk space you need.
  • Opening port 8888 to the remote server to send data to the remote site. You can change this port.
  • Opening local port 9000 to be used for managing replication
  • Chrome browser, edge, or Firefox browser if using the web-based GUI.
  • Root access and password (You can run as a separate user; however, that is a different blog post)

3. Some Terminology Used

3.1 Basedir

  The top directory where you will install EDpCloud. In this example, /opt2/enduradata.

ED_BASE_DIR environment variable is set during installation to point to the top directory (BaseDir) where the software is installed.

Accordingly, $edpcloud refers to $ED_BASE_DIR/edpcloud.  In our examples /opt2/enduradata/edpcloud

3.2 Links

  • EDpCloud uses links, or replication sets, to associate a sender and one or more receivers. A link name is an alpha-numeric string that must start with [a-z].
  • A link is a logical way to associate one local data sender and one data receiver. A link can have one or multiple receivers, but creating a separate link for each receiver is better for granular replication control.
  • Keep link names short and meaningful. For example, london2ny  is a meaningful name for file transfers from host London to New York.
  • eddist.cfg” configuration file uses keyword= “value”.

We explain all keywords in eddist.cfg documentation. Use man eddist.cfg to learn more. You can also use HTML or PDF manuals.

4. Directory Structure:

  • $edpcloud/bin: contains all binaries and tools
  • $edpcloud/etc: contains configuration files
  • $edpcloud/doc: includes the documentation in man pages, HTML, and PDF formats
  • $edpcloud/etc/certs: SSL certificates top directory
  • $edpcloud/certs/public: public certificates
  • $edpcloud/certs/private: private certificates
  • $edpcloud/certs/ca: Authority.

5. Some Basic Commands of EnduraData Linux File Replication

  • edpcloud.sh startall: starts replication
  • edpcloud.sh stopall: stops replication
  • edq: Queues files on demand for replication
  • edmfq: Smart queueing of files
  • edstat: shows the status of replication
  • edjob: Lists past replication jobs, and manages jobs (i.e., cancel jobs, priority control, etc.).
  • edpause: pauses replication
  • edresume: resumes replication
  • mio: This is a tool you can use to generate test files (use edq to queue them for replication)
  • edcmpf: Compare local and remote files to see what needs to be replicated.

Both edq and edmfq can be used to create snapshots. Check the use of Macros in eddist.cfg for more information.

6. Downloading  EDpCloud Linux File Replication

We assume that you downloaded the latest package as of this writing:

edpcloud_LINUX_x64_v6_0_6_E-libc-ge-2.6.tar.gz

  • Obtain a demo or permanent license from EnduraData

7. Installation of Linux File Replication

7.1 Become root:

sudo su -

7.2 Extract the package:

tar xvf edpcloud_LINUX_x64_v6_0_6_E-libc-ge-2.6.tar.gz

7.3. Install the software

cd enduradata_edpcloud
./install.sh

You will be prompted for the following parameters (Use the defaults at first).

Accept the entire agreement [YES/NO]: yes 

Enter install path name [/usr/local/enduradata] : /opt2/enduradata

Management port: [9000] :

Content receiver port: [8888] : 

Run the WEB/browser-based GUI. (y|n): [Y] : 

Start running the services now. (y|n): [Y] :

7.4. Post installation

After installing EDpCloud, setup the environment by running:

. /opt2/enduradata/edpcloud/bin/enduradata_env


The command above sets up the PATHS and other environment variables used internally.

 Run the following command to start EDpCloud automatically after reboots:

/opt2/enduradata/edpcloud/bin/autostart/sys5debian_autostart

8. Configuration And Setup of Linux File Replication

All configuration files reside under $edpcloud/etc.

We will create the following data replication configurations:

  1. The file eddist.cfg: EDpCloud uses this file to configure which sender will send to which receiver and where to store data. It has many options.
  2. The file edpasswd: Although you can use SSL certificates for authentication, this file is still required to indicate which host is allowed to send or manage replication (In addition to the authentication)
  3. The file includes: A list of patterns to include in file synchronization. It uses regular expressions (one per line). If this file does not exist, everything not excluded will be replicated.
  4. The file excludes: This is a list of file and directory patterns to exclude from data replication. It uses regular expressions (one per line).
  5. edfsmonitor.cfg: This configuration has a list of directories to monitor in real-time (Exact name of the directory; one per line). Please note that these are not regular expressions but exact names of directories.

Please examine the man pages for these configuration files. The documentation is under $edpcloud/doc/

Use a text editor like vi or nano to edit the configuration files.

Once you finish editing all configurations, run the following commands:

edpcloud.sh startall
edstat

8.1 A simple eddist.cfg test configuration file example

Start with a simple configuration to ensure everything is normal, and start replicating from the local host to itself before you try syncing data to a remote site.

<?xml version="1.0" encoding="UTF-8"?>

<config name="enduradata" password="Addoud4d4ch1n1gh4T4s4" workers="4">

  <link name="l1" >

    <sender hostname="127.0.0.1 "

    />

    <receiver hostname="127.0.0.1" 

 storepath="/data/incoming" 

    />

  </link>

</config>

Apply the configuration by running:

edpcloud.sh startall

Verify the status by running:

edstat

Examine the logs in $edpcloud/logs, especially $edpcloud/logs/eddist.log, ed_sender*log, ed_receiver*log, to check for any errors.

8.2 Example of eddist.cfg file sync configuration with a local and remote server

We assume that the host named localservername is replicating a host named remote server name. The host named remoteserername will store replicated data and store it under /data/incoming.

We also use four worker threads for a parallel data stream to sync files from the local server to the remote server.

<?xml version="1.0" encoding="UTF-8"?>

<config name="enduradata" password="Addoud4d4ch1n1gh4T4s4" workers="4">

  <link name="l1" >

    <sender hostname="localservername" 

    />

    <receiver hostname= "remoteservername" 

 storepath="/data/incoming" 

    />

  </link>

</config>

8.3 Example of edpasswd file

The format is a list of  lines with a hostname and password separated by a pipe (“|” ): 

localservername|localpassword
remoteservername|myremotepassword

You can also use wild cards for the hostname or IP address, as shown below:

*.enduradata.com|12TikChbila42 192.168.200.*|Tiwliwla$% 192.168.5.10*|Tiwliwla$%

The hostnames, patterns, and passwords must match between the senders and receivers.

8.4 Example of $edpcloud/includes

The format is a regular expression in each line. All expressions are Ored.

/data/.*
/home/eh/source/.*\.c$
/home/eh/source/.*\.h$
/home/eh/source/.*makefile.*
/home/demo/.*

If the includes file is empty or does not exist, then the rule degenerates to “Replicate any files that do not match regular expressions in the excludes list.”

8.5 Example of excludes

EnduraData ships with a default excludes file. Edit this file with care. For example, we never want to replicate /etc, /var, or /vmlinuz, or /proc, etc. to a remote host unless the destination directory is something other than “/” (if we do, we will replace the remote host-specific system configuration or log files and clobber the remote).

8.5.1 Here is a portion of the default $edpcloud/etc/excludes file:

^/Volumes$
^/Volumes/.*
^/.*/edpcloud/data/.*
^/.*/edpcloud/logs/.*
^/initrd.img
^/initrd.img.old
^/vmlinuz
^/vmlinuz.old
^/bin/.*
^/boot/.*
^/cdrom.*
^/dev/.*
^/initrd/.*
^/lib/.*
^/.*/lost+found.*
^/media/.*
^/mnt/.*
^/opt/.*
^/proc/.*
^/root/.*
^/etc/.*

We can add our custom changes by appending the  following expressions to the end of the existing excludes file:

^/.*/LINUX/.*/debug
^/.*win.*/.*/debug
^/.*MACOSX.*/.*/debug
^/.*/core$
^/home/.*/.cache/
^/home/.*/.mozilla/firefox/hxpp9amo.default/
^/home/.*/.local/share/zeitgeist/
^/home/.*/.config/google-chrome/
^/home/.*/.pki/
.*/Thumbs.db$

8.6 Example of the real-time configuration edfsmonitor.cfg 

edfsmonitor.cfg is a list of directories that must be monitored in real-time for file data and metadata changes.

DO NOT use “/” in a line by itself. Use only specific directories under “/”.

 

/home/demo
/home/eh/source
/data

The above configuration will monitor /home/eh/source, /home/demo, and /data. All file changes or metadata changes under the directories above are queued automatically for synchronization if they match what is in $edpcloud/etc/includes and don’t match what is in excludes.

After you finish configuring, run the following:

edpcloud.sh startall
df -h 

If the real-time is working correctly, you should see directories configured for real-time as follows:

Filesystem Size Used Avail Use% Mounted on
eduswfs 1.3T 1.1T 103G 92% /home/demo
eduswfs 1.3T 1.1T 103G 92% /home/eh/source2
eduswfs 1.3T 1.1T 103G 92% /home/eh/source

Notice that eduswfs is an EnduraData filesystem monitor.

9. Testing File Sync and Replication:

To test replication, use the following command to queue something on demand:

edq -n directorynametoreplicate

9.1  Testing on demand data replication

mkdir /home/username/test
ls -alt > /home/username/test/file1
edq -n /home/username/test

Then, examine the content of the destination path (/data/incoming in the configuration above)

9.2 Testing the edfsmonitor.cfg real-time configuration above

cd /home/demo
ls -alt > f1
ls -alt > f2
echo "foo: I changed f2 >> f2
chmod 777 f1
chmod 700 f2

You should find the files f1 and f2 under the destination directory you specified in eddist.cfg using the keyword destdir=”/data/incoming

9.3 Example output of edstat

edstat file replication output

The output of edstat file replication command.

10. Creating Data Synchronization Jobs Using Edscheduler

edscheduler is like a cron. Alternatively, you can use cron.

If you are not using the real-time configuration, you can schedule jobs to replicate data at:

  • Different days of the week
  • Hours
  • Minutes

You can combine methods of synchronization: real-time or on-demand.

Using the text editor, create edscheduler.cfg.

The scheduler configuration file  format is as follows:

 minute hour day_of_month month day_of_week command [command arguments]

 Where:     

  • The minute is a number in the range [0 .. 59] or a “*”. A star indicates every minute. 
  •  The hour is a number in the range [0 .. 23] or a “*”. 0 is for midnight, 1 is for 1 am and 23 is for 11 pm. A star indicates every hour
  • day_of_month is a number in the range [1 .. 31] or a “*”. A star indicates every day of the month
  • The month is a number in the range [1 .. 12] or a “*”. A star indicates every month. 1 is for January, 2 for February, …
  • day_of_week is a number in the range [0 .. 6] or a “*”. A star indicates every day of the week. 0 is for Sunday, 2 for Monday, …, 6 is for Saturday
  • The command is the command you want to execute, followed by the arguments. The commands must have an absolute path or be in the PATH environment variable. You can call any script or binary.

10.1 Example of an edscheduler.cfg file

The following configuration is edscheduler.cfg:

  • Checks the status of replication at minute 1 of every hour.
  • Checks the past jobs status 11:30 pm.
  • It replicates data under /data/tolondon at minute 5 at midnight to link us2london
  • Replicates /data/images to the receiver virginia09 at 5:15 am
  • Replicates /home/users to host nybonds1 at one 17:01 
   1  * * * * c:\\enduradata\edpcloud\bin\edstat >> /tmp/edstat.log
  30 23 * * * c:\\enduradata\edpcloud\bin\edjob >> /tmp/edjob.log
   5  0 * * * c:\\enduradata\edpcloud\bin\edq -l us2london -n /data/tolondon >> /tmp/tolondon.log
   15  5 * * * c:\\enduradata\edpcloud\bin\edq -r virginia09 -n /data/images >> /tmp/images.log
   1 17 * * * c:\\enduradata\edpcloud\bin\edq -r  nybonds1 -n /home/users >> /tmp/users.log

11. Intelligent Linux File Replication Queuing For Synchronization

Another file queuing command is called “edmfq“.

With edfmq, you can create faster queuing. For example”

  • queue files modified in the last hour
  • queue files modified between 5 am and 7:30 am
  • queue files older than file /tmp/test
  • check the manual page for edmfq for details
  • edmfq can be used for a dry run as well

11.1 Examples of intelligent file replication queuing:

  • Find and queue files under /data that are 0 to 3600 seconds older than file /tmp/trigger or that are no more than 30 seconds newer than trigger:
edmfq -t 3600 -T 30 -n /tmp/trigger -s /data -q

Without the “-q” option, files will not be queued. They will only be listed.

  • To automatically synchronize all files that have changed in the last:

30 days (10*24*60*60 seconds) under /home to link foo

  • To synchronize all files under /home that changed last month and to replicate only to link foo:

edmfq -t 1m -s /home -q -l foo

  • To synchronize the files that changed in the last five days:
edmfq -t 5d -s /home -q -l foo

Notice that edmfq can also be used for dry runs without replicating (i.e., to list the files that will be replicated), but use edcmpf to compare the local and the remote.

12. Comparing Local And Remote Data Replication Content

You can use the edcmpf command to compare files on the local and remote systems.

13. Advanced Settings

The documentation for eddist.cfg lists additional keywords to use in the replication XML configuration.

14. Best Practices For Ensuring Data Integrity And Security.

By default, EDpCloud sets up $edpcloud permissions to be read, write, and execute (700) only by root. Make sure you protect the top directory. 

EDpCloud can use SSL certificates that you can generate on your own or through a third party.

15. Monitoring and Maintenance

  • Use edstat to see the status of replication.
  • Use edjob to list past jobs.
  • Examine local history: history_sender*
  • Examine remote history history_receiver.log
  • Periodically remove files in $edpcloud/logs with extension *.edz or move them to an archive directory
  • use edstat -D to shrink the stats database
  • use edstat -V to vacuum the journals
  • running out of disk space on $edpcloud or on the destination directories

16. Troubleshooting Common Issues

The most common issues are in EDpCloud file replication for Linux:

  • Permissions to receive data: remote host not allowed to send. Check edpasswd and eddist.cfg to make sure the sender is allowed to send data
  • Remote receiver ports are blocked: use telnet or netcat to see if you can connect the the receiver port.
  • NAT IPs not allowed to send: uses alias=”*” to test, then use the correct alias (check ed_receiver*log): check ed_receiver*log and check if you have alias=”NATedIPs”
  • DNS name resolution issues or prolonged responses (Use /etc/hosts file first when resolving names)
  • Hostnames not in /etc/hosts (better add hostnames for all hosts in eddist.cfg to /etc/hosts or use IP addresses)
  • The sender and receiver passwords do not match: fix edpasswd file
  • The SSL certificates (if used) do not match: test with self generated SSL certificates (use edgensslcerts)

EDpCloud uses a reverse lookup to check the sender. If your DNS is not set up correctly, you may use IP addresses instead of the hostnames. 

Use “edresolve hostname” to see how fast your DNS responds to EDpCloud queries (Do this on both the sender and the receiver). You can also use netcat to verify connectivity as well.

If you use NAT, you may want to add the Nated addresses to both the $edpclcoud/etc/myaliases configuration file and the alias=”IPaddresses” in eddist.cfg.

$edpcloud/etc/myaliases is a list of my host aliases (in case you do not have access to DNS or cannot change /etc/hosts)

16.1 Example of the alias keyword to deal with NAT

       <sender hostname=”london” alias=”10.0.200.246|192.168.200.3″        />

16.2 Example of $edpcloud/etc/myaliases

uh11
linuxrocks
uh11.somedomain.com
::1

17. Tips For Regular Maintenance To Ensure A Smooth Operation.

EDpCloud File replication for Linux uses extensive logs for audit trail. These logs grow depending on how many data files you replicate and how often.

When logs exceed a specific size limit, EDpCloud renames and compresses the files by adding extension “.edz” . Use the edzdump command to uncompress the files with “.edz” extension.

A good practice is to clean up these logs after a particular time or move them to another archive space. You can use edscheduler.cfg to do this.

You should also monitor the available storage space for $edpcloud. EDpCloud will stop working if it has no free space.

The history database file $edpcloud/data/statdb tends to grow. You can attempt to shrink it using “edstat -D”  and “edstat -V” at run time.

You may need to clean or move it somewhere else when it gets huge. You must do this only after you verify that no files or jobs are pending for replication (use edstat to test for this) AND after you stop the software.

18. Technical Support

Contact EnduraData tech support using enduradata.com/contact if you need help with your trial.

Call +1-952-746-4160 for assistance.

19. Conclusion

We touched on some basic configurations of EnduraData Linux file replication, real-time synchronization, on-demand(ad hoc) file sync, and on-scheduled sync. Consult the documentation for additional configuration parameters and details about edq and edmfq.

Get your free trial copy of EDpCloud Linux file replication (or Windows replication, etc) from https://enduradata.com/downloads and get free support.

EnduraData also provides extended proof of concept(POC) support for businesses and government agencies.

20. Related Content

https://access.redhat.com/ecosystem/software/2930911

 

Linux File Replication Installation and Configuration was last modified: January 8th, 2024 by A. A. El Haddi

Share this Post