EnduraData EDpCloud Cross Platform File Replication and Transfer Solutions(eddist.cfg)
Check for Updates of your file sync and replication software from www.enduradata.com

EnduraData XML Configuration

eddist.cfg


NAME

eddist.cfg - Configuration file for EDpCloud's File Synchronization, Content Distribution and file replication Suite.

SYNOPSIS

EDpCloud uses eddist.cfg as its main configuration file. EDpCloud allows users to sync files and folders (unstructured data) to one or more remote locations. EDpCloud can be installed on physical or virtual machines. EDpCloud can be used to synchronize data, as a backup solution (data protection and data recovery), to share data between multiple servers and geographic locations. eddist.cfg can be tailored and customized to a granuar control of your file and folder synchronization. This document explains how to do that.

The eddist.cfg is an XML configuration file for EnduraData's File sync, file replication and Content Distribution Suite. The same file sync syntax is also used For EnduraData Remote Directory Synchronization Suite. The file sync and replication software is designed to be configured using EnduraData's GUI or a text editor such as vi, notepad, etc. The elements and keywords that constitute the file sync configuration are discussed here. Certain elements are required, others are optional. The configuration file allows the user to define what content to synchronize, replicate and distribute, where to send it and where to store it. A file synchronization configuration can be as simple or as complex as you want it to be. A configuration is a logical way to organize and manage content distribution and replication between multiple systems and geographic regions. You can create configurations for localhost to localhost, localhost to one or more remote hosts, many remote hosts to a single host or many remote hosts to many remote hosts. You will find a few examples in this document to help you get started. The following few sections will serve as a tutorial to configure EDpCloud for data recovery, data protection and to distribute all sorts of data between business processes and sync files between systems and geographic sites.

The etc directory contains a few examples that illustrate the syntax of the configuration file.

APPLYING THE CONFIGURATION

Once you have created a configuration you need to:

a. Verify that the configuration is valid, by executing edverify
b. Copy the configuration to $ED_BASE_DIR/etc/eddist.cfg
c. For Windows users, execute edrestartall
d. For Linux/Mac/Unix users execute edpcloud.sh restartall

Take a look in the bin directory and you will find a wealth of commands that can be used.

SECURITY WARNING

eddist.cfg must be readable only by the owner because it contains sensitive information.

TERMS AND DEFINITIONS

EnduraData's File synchronization, replication and Content Distribution Suite uses a few terms that we will define here:

configuration: A logical organization of sync and content distribution network. An Enterprise may have one or more configurations. Each configuration has (a) one or more senders (b) one or more receivers and (c) one or more links. A configuration is identified by its name.

sender: A sender is a node or a device that sends data to one or more servers. The sender has a lot of parameters that we will define in subsequent sections. A sender is identified by its hostname and by the name of the parent link.

receiver: A receiver is a node that receives data from one or or more senders(file sync and file replication receiver). The receiver has a lot of parameters that we will define in subsequent sections. A receiver is identified by its hostname and by the name of the parent link. Duplicate receivers are not allowed in the same link. See counter examples for information. The maximum number of receivers depends on the version of the software you licensed. The personal and home editions are limited to one receiver only.

The sender and receivers host names must be resolved using DNS, NIS or the hostfiles. If this cannot be done, you may use IPs or create aliases under the etc directory.

link: A link is a logical grouping of one and only data sender and one or multiple file and data receivers. All filesync receivers in a link have one common file sender. The link has a name which is unique across the configuration. A link is identified by its name. A configuration has one or multiple links. The maximum number of file replication links depends on the version of the software you licensed. The personal and home editions are limited to one link only.

A sender can belong to one or more links. A receiver can also belong to one or more links.

Senders and receivers can be identified with hostnames, IP or fully qualified names. A sender must be able to connect to a receiver and a receiver must be able to connect to a sender.

FILE SYNC CONFIGURATION FORMAT AND SYNTAX

Each section in eddist.cfg inherits its defaults from the parent nodes (except for the name and hname). Therefore you can define some global parameters and they will be shared by all sections of your file synchronization configuration. You can override the global parameters by specifying new ones inside the new sections. Sibling parameters are not inherited, but parent parameters are inherited.

The following is an example of a simple configuration:

In this example, the configuration's name is simple. The configuration has only one link. This link is identified by the name=tokyo. The link has one sender identified by hostname="tokyo.enduradata.com". The link has one receiver identified by hostname="london.enduradata.com". Notice the other required parameter: storepath. This parameter is described in the next section. Tokyo defines a link that synchronized data from tokyo.enduradata.com to london.enduradata.com. The synchronized data on london.enduradata.com will be stored under /home/reports/london.

FILE SYNC AND REPLICATION CONFIGURATION PARAMETERS

The configuration parameters use the format: paramname="value".

The following is a list of supported parameters.

include="regular expression": This is a regular expression that lists what patterns are included in the files and folders synchronized from tokyo.enduradata.com to london.enduradata.com. Readers not familiar with regular expressions will need to read a regular expressions tutorial. The default for this parameter is everything or ".*". If no include parameter is used, the includes file in etc is used.

The following are some examples of some regular expressions:

^/home/.*: means everything under /home
(^/home/.*)|(^/data/.*): means everything under /home or everything under /data.

storepath="destination directory name": This is the top level directory where any thing received will be stored. In the previous configuration example, any data sent from /home will be stored under /home/reports/london/home. The receiver server will not work unless this directory exists a priori. Therefore, you will need to create it before you start sending the content to the receiving node unless you setup an environment variable called ED_CREATE_STOREPATH. storepath and rootpath are interchangeable. storepath may include some patterns that will be resolved by the receiving nodes. Patterns are enclosed within the percent sign. The following macro patterns are supported for the entreprise versions of the software. They are not supported in the personal or home editions of the software. Macros names are case sensitive.

%ip% : When found in a storepath, the receiver will substitute %ip% with the senders' IP.
%hostname% : When found in a storepath, the receiver will substitute %hostname% with the senders' hostname.
%home% : When found in a store path, and if a user name(loginname parameter) is supplied in the configuration, the receiver will substitute the user home directory to %home% in the storepath (if the username on the remote system exists). The %home% macro will be expanded to the home directory of the user in loginname (sender section) only if the user exists. This macro applies only to Unix/Linux/Mac systems as of this writing. If you choose to use this pattern, it should be the first parameter in the storepath. The home directory is for the "loginname" given in the sender entry of the link you are replicating data over.
%sender%: This pattern is substituted with the hname of the sender entry.
%receiver%: This pattern is substituted with the hname of the receiver entry.
%link%: This pattern is substituted with the name of the link entry.
%weekday%: This pattern is substituted with current week day (sunday-monday).
%day%: This pattern is substituted with current day of the month.
%date%: This pattern is substituted with current date (yyyymmdd format).
%month%: This pattern is substituted with current month(1-12).
%year%: This pattern is substituted with current year.
%dayofyear%: This pattern is substituted with day of the year (1-365)
%hour%: This pattern is substituted with current hour(0-23)
%minute%: This pattern is substituted with current minute(0-59)
%second%: This pattern is substituted with current second(0-59).
Example

storepath="%home%/backup/%ip/%link%" Will resolve to storepath="/home/ika/backup/192_168_19.10/atlanta" if your home dir was /home/ika and the senders IP was 192.168.19.10 and your linkname was "atlanta". storepath must exist on UNIX, LINUX, AIX, MAC or replication will fail.

strip="path to strip from file and link names|path2|...": This is an optional parameter. If specified, this path(s) is/are removed from the the original file name. Multiple paths can be separated by a pipe(|) delimeter.

For example if strip="/backups", then "/backups/home/jj/interference/signals.txt" becomes "/home/jj/interference/signals.txt".

for example if strip="c:\d1|c:\d2" then both c:\d1 and c:\d2 will be removed from the destination path. If nothing is left in the path then an error is raised.

The final file sync storage name will be also modified by the value of storepath.

This parameter is useful when restoring data from a remote site previously stored in a storepath other than the ROOT directory.

If you use strip, and you are experiencing sync problems, re-examine your include/exclude regular expressions.

stripdir="1": This is a dangerous parameter if storepath is "/" under unix. It strips the entire directory path from the original file name and piles the files in store path. So if you are replicating /dir1/f1 and /dir2/f1 one of them will be overwriten. Some users use this dangerous feature to extract data from some filers and instruments and may only be a danger for the rest. When stripdir is set to one, only the file name is preserved without the leading path. So if your storepath is something like /incoming and your remote file is /data/pricing/cmo.xls the new file will be stored in /incoming/cmo.xls.

name="link_name": This is a required parameter for the config and link sections. The value assigned to name designates the name of the link or of the configuration. Link names must start with an alphabetic character. The name all is reserved and should not be used as a link name. The valid characters in a link name are only in the sets: [a-z][0-9].

hostname="hostname_or_ip" or hname="hostname": This is a required parameter for both the source and the receiver. It is critical that you use the fully qualified hostname. To be certain, use the name returned by the command line hostname. Please see alias parameter for additional information. IPv6 addresses should be soubstituted with hostnames and at times the interface must be specified. Additional aliases for localhost can be put in $edcploud/etc/myaliases.

password="passwordtext": This is a password to be used for authentication. The sender uses this to identify itself to the receiver.

include="regular_expressions_to_include": This is a required parameter for both the source and the receiver. This is a POSIX regular expression. The include parameter has expressions that will have to be matched by file (directory, link, ...) names in order to be included in the list to be synchronized.

exclude="regular_expressions_to_exclude": This is a required parameter for both the source and the receiver. This is a POSIX regular expression. The exclude pattern has a list of name patterns to be excluded from synchronization. The default exclude value is nothing.

The data distribution suite, examines the include regular expression and exclude regular expression. Only data matched by the include pattern locally and by the include pattern remotely and not matched by the exclude patterns of the local and remote systems will be sent by the local node and accepted by the remote node.

storepath="destination_directory": This is a required parameter for the receiver. It is the name of the top directory where all received data will be stored. It is equivalent to the new root directory of the remote data.

incfile="filename_containing_patterns_to_include": This is the name of a file that has a list of regular expressions to be ORed and included in the synchronization. Use this if you have a huge list of expressions or if a work flow application generates a list of patterns or file names for you. There is one experession per line. All lines are ORed using '|' to form a larger expression. Use incfile for complex or very long expressions.

Example incfile="incfilename.txt" where incfilename.txt contains:

exfile="excludefile": This is the name of a file that has a list of regular expressions to be ORed for exclusion. Use this if you have a huge list of expressions or if a work flow application generates a list of patterns or file names for you. exfile has the same format as incfile.

alias="aliasregularexpressions": This parameter indicates that the sender can also be recognized by the receiver as having another alias. It is a different way of letting the remote user accept connections from an entire domain or IP patterns. This is a regular expression. Using this parameter, you can make a single configuration to use with many senders across an enterprise while referring to the sender's hostname as a localhost only. The alias keyword is supported only in the entreprise edition of the software.

Example alias="192.168.*|*.enduradata.com" where incfilename.txt contains:

history="0|1": This parameters controls wheter we log what was sent or not.

versions="maxversionnumbers": This parameter controls the ability to go back and restore various versions of the file. DO NOT CONFUSE THIS PARAMETER WITH THE XML VERSION 1.0 IN THE HEADER. This parameter has the number of versions of a file to keep around for snapshot purposes and for recovering deleted files. Depending on available storage on the receiver side, you can keep as many versions of a file as you want, as long as you have enough storage for all of them. Older versions of the files are kept in names that match patterns {oldfilename}.enduradata_snapshot.nnn where nnn is the sequence of number 0 to maxversions. When you reach the maximum number of snapshots to keep around, the extension nnn is rotated and the older version is overwritten. To find out the file age, use creation time for sorting. This will allow you to recover older versions of your files and to recover from deletes. The enduradata_snapshot file pattern can be used for cleaning up of older versions using a simple command like find -name "*.enduradata_snapshot.*" -exec some_command {} \;

if versions is negative, edpcloud will keep all previous versions. This is the prefered method, it is more storage efficient but has a little more overhead in terms of computational time. A negative versions is an archive.

archive="0|1": This parameter controls the ability to go back and restore various versions of the file. This parameter is the same as a negative version. Set this parameter to any character and all versions of the file will be archived. The archive file name is derived using the original file name and the MD5 checksum of the file. All files will be in a directory called "enduradata_snapshot" unless archivedir is used.

archiveincxpr="archiveincludepatternfile": archive pattern include file contains regular expressions. If a file name matches the regular expressions then the old version of the file is archived.

archiveexcxpr="archiveexcludepatternfile": archive pattern exclude file contains regular expressions. If a file name matches the regular expressions then the file is excluded from the archive

archiveincfile="archiveincludefile": a file that contains archive pattern as regular expressions. If a file name matches one of the the regular expressions then the file is included in the archive

archiveexcfile="archiveexcludefile": a file that contains archive pattern as regular expressions. If a file name matches one of the the regular expressions then the file is excluded from the archive

By default, if file archiveexcludes is found under edpcloud etc directory, its content will be used to filter out files to exclude from archival. By default, if file archiveincludes is found under edpcloud etc directory, its content will be used to include files that match the regular expressions in the includes file.

archivedir="dirname": dirname is where the archive files are put. If no dirname is specified then the files will be in enduradata_snapshot that is a sibling of the original file. archivedir can have the following macros (as described in storepath above): %sender%, %receiver%, %link%, %date%, %year%, %month%, %weekday%, %day%, %hour%, %minute%,%dayofyear%, %second%, %ip%, %hostname%.

A catalog of archived files is located in enduradata_farchive_list under archivedir. It shows the original file name and its original md5 and the time of change of the current file in storepath.

If archivedir is a real path then the file will be archived in that path. A good example is to put archives in a directory parallel to storepath. Example archivedir="/home/archives" for Linux or archivedir="d:\archivedata". These files tend to grow and you may want to set a schedule to clean them up when not needed.

Archive files may be safely compressed using gzip or zip or any other compression program that adds ".gz" or ".zip" extention to the file name and a duplicate archive file will not be created. A file is archived only if file_md5 or file_md5.gz or file_md5.zip is not found in the archivedir. A list of the file archives and the modification times is available in the archive dir as well.

bwmax="maxbytespersecond": This parameter is used for throtling the bandwidth. It is in bytes per second. This is the maximum bandwith to use when sending file data to a remote node. The maximum bandwidth is on a per receiver. If you have more than one receiver per link, then the maximum bandwith is the sum of the bandwith factors used for each receiver. You can specify a different bandwith for each receiver or the same for all receivers.

Example

        bwmax="65536" will give you approximatively 64 Kbytes per second for each receiver in the link.

maxfailures="maximumfailures": This parameter is used for the maximum failures allowed. This represents the maximum number of retries edpcloud will try to send your data to a remote site. If for some reason we fail to send the data after maxfailures, the system will not try to send the data after maxfaiures is reached. The default value is 256.

Example

        maxfailures="0" : never give up trying to send the data.
        maxfailures="32" : stop trying to send data to the remote site after 32 failures.

maxqlen="maxnumpayloads": This is how many payloads (A payload contains many files) that will be queued and made available for the transport layer to send to the remote location. The default value is the number of workers times 6. An optimal value should be between (2*numworkers) and (numworkers*6).

Example

        maxqlen="10"

uid="userid": This parameter is used to allow userid to take ownership of the files. gid="groupid": This parameter is used to indicate that the files will be part of the UNIX group groupid.

isscheduled="0|1": This parameter applies to the receiver and tells the system that the current sender will accept files mannually or scheduled. The default is 1. isrealtime="0|1": This parameter applies to the receiver and tells the system that the current sender will accept files changes in real time. The default is 1.

Both userid and groupid must exist in the password files. Valid values are numbers rather than the group or user names.

workers="number_of_threads": This parameter is the number of streams to use for sending data to the link. The default is 1. This parameter is valid only for the receiver. This parameter must be tunned to get better performance. A high number may lead to disk contention and network congestion. Adjust it and measure your performance rather than setting it and forgetting about it.

filterlinks="0|1": This parameter if set to 1, instructs the system to apply includes/excludes to symbolic links. The default is 0.

rhistory="0|1": This parameter tells the receiver to save history of what was synchronized. The default is 1.

deletes="1|0|Y|n|y|N": This parameter is used to tell the receiver if it should propagate the deletes. When set to one, a delete that happens on the sender side is also applied to the receiver side. The default value is 0 (deletes are not propagated). Users must be careful with this parametr since directories are deleted recursively if they were deleted on the sender side as well. This parameters applies to the real time product only.

forcedelete="1|0": This parameter is used to tell the receiver to force a delete of a protected file or directory. When set to one, the receiver will first change the permissions on the remote before attempting to delete the file or directory in question.

acls="y": This parameter is used to replicate ACLs. The default is no.

alwayscopy="1|0": If set, this parameters tells edpcloud to always copy the entire file even if it did not change.

alwayscopylarge="minsize": If set to minsize bytes other than 0, this parameters tells edpcloud to always copy the entire file if it's size is larger than minsize.

alwayscopysmall="maxsize": If set to maxsize bytes greater than 0, this parameters tells edpcloud to always copy the entire file if it's size is less than maxsize.

nostatfailure="1|0": If set to 1, then meta data setup failures such as chmod, chown, chrp, acls on the remote will be ignored if they fail. This is usefull if someone removes a file or a directory on the remote while it is being replicated.

maxpayloadbytes="maxsize": If set to maxsize bytes greater than 0, this parameters tells edpcloud batch files with a total size of no more than maxsize (sum of all file sizes).

procpriority="intvalue": This is the prority of ed_sender transport process on the sender side. Under Linux and UNIX flavors, this is equivalent to nice values. Acceptable values for windows: 0 (NONE), 1(run only when Idle)), 2(below normal), 3(normal), 4(above normal), to 5 highest. Default value is 0(unchanged). Acceptable values for *NIX*: any increments of nice. see man nice.

Post and Pre processing and integration with workflow

Administrators may need to run some commands, that may be required for workflow purposes, before a file is sent(sender side) or before it is stored(receiver side). Administrators may also need to run other commands after a file is sent and after it received. They can use the post/pre keywords to specify the commands to run. The commands must be located in the directory postpre under base dir (same level as etc directory).

The user may specify if the sender or receiver should wait for the post or pre commands to finish before proceeding to the next task. They can use prewait or postwait for this purpose. However, the postwait and prewait may fail to wait since the main thread worker may exit before the pre/post processing are done. For this reason, we suggest to use the default of postwait and prewait values of 1 (Wait for post/pre processing to finish).

The post and pre directives can be used to create automated processing like ETL (Export Transform Load) and to distribute task execution while moving data to desired locations or aggregating data from desired locations.

post/pre commands

pre="precommand": This is the command to run before a file is sent(sender side) or stored(receiver side).

post="postcommand": This is the command to run after a file is sent(sender side) or stored(receiver side).

prewait="1": Wait for the pre command to finish.

prewait="0": Do not wait for the pre command to finish

postwait="1": Wait for the post command to finish.

postwait="0": Do not wait for the post command to finish

post/pre commands arguments

Both precommand and postcomand are passed a post/pre processessing file as their first argument. The file contains the list of files that will/were be synced and to which the post/pre commands should be applied. The files are stored in a directory named postpre/postfiles under the base directory. The post/pre should dispose of these files once they are processed.

pre processing file content

The pre processing files have the following format. Variables are separated using a pipe. The content of the pre processing file is as follows:

post processing file content

The post processing files have the following format. Variables are separated using a pipe. The content of the post processing file is as follows:

When running the post commands, the command must examine the status variable and take action accordingly.

Example of post/pre processing

In this case the receiver will run run_report1 and will not wait for it to finish. The sender will run notify_clients but it will wait for it to finish.

Encryption

EnduraData distinguishes between two types of encryptions: Messages and communication encryption. This applies to all communications between senders and receivers. Another type of encryption is relevant to file encryption. The user may decide to leave the files encrypted when sent(But they must remember the encryption keys if they want to get their data back.) or they may decide to decrypt files once they arrive to their destination. A user can use either communication encryption or file encryption or both. When using communication encryption alone, all data transmited is encrypted by the sender and decrypted by the receiver if their keys match.

Encryption is controlled by a combination of parameters in eddist.cfg and files (edkeys, edfilekeys). Because of export restrictions, only AES 128 bit is available by default. Stronger encryption is available only in certain countries such as the US. Please check with your sales representative first.

Communication encryption

To encrypt communications between servers, create a file called edkeys in your etc directory:

edkeys file format

You can use regular expressions for the link, the sender and the receiver. The lines are evaluated lastin first out.

For every link, every sender, every receiver use foo as a key to encrypt communication.

Example of multiple lines in edkeys

Encrypting files on flight or on disk

File encryption is configured by setting an encryption key and an encryption/decryption directive using: key, encrypt or decrypt.

key="secretkey"
encrypt="1|0"
decrypt="1|0"

The encrypt and decrypt work in conjunction with key parameter. Both encrypt and decrypt are mutually exclusive.

EDpCloud can be configured to use encryption in a multitude of ways. Users have the following choices:

a. Encrypt communications between servers: use key="secretkey" for the sender and receiver or create file edkeys under etc directory.
b. Encrypt all files before they leave the server and leave them encrypted on the remote server: Use fkey="keyname" on the sender side. And specify encrypt="1". fkey and encrypt keywords must be set together in order to encrypt.
c. Encrypt all files before they leave but decrypt them before storing them.

Users can either decrypt on restore or they can use edcryptor to decrypt in place(see html documentation).

Example of file encryption

This example will encrypt files using mysecretkey before they leave the orders.companya.com. The data will remain encrypted on the remote machine (sales.companyb.com). The decrypt="0" decides wether data will be decrypted or not.

It is critical that the user remembers the encryption key and protects the directory where the configuration exists. If you lose the key, you will not be able to access the encrypted data.

Example of file encryption

This example will encrypt files using mysecretkey before they leave the orders.companya.com. The data will be decrypted on the remote machine (sales.companyb.com).

EXAMPLE Backing up to a local drive (1 to 1 )

The following is a simple configuration that backs up the content of /home to /backup.

EXAMPLE2 Distributing content to many hosts (1 to many)

The following example has two links.

link1: sends data from localhost to localhost. Only content under /data1 and under /var/www/realm1 is sent. It is stored under /home/dest/link1.
link2: sends data from hostname targua to the following hosts: localhost, mumu, feddev0, mac11, sol13 and www2. Notice how link1 uses a different password than link2.
        Each receiver can use include and exclude to specify what content to accept and what content to reject.

EXAMPLE3 Many to one content collectors and consolidation

A simple configuration that can collect and aggregate content from many hosts to a single host (data protection/data recovery for example) is shown bellow. Even if this configuration is simple, it actually allows all hosts with the fully qualified name that match *.enduradata.com to send content to collector.enduradata.com for backup, data analytics or other decision making tools.

EXAMPLE4 Windows meets Linux, Mac or Unix

        The following configuration shows a combined configuration for Windows and *NIX* platforms.

COUNTER EXAMPLE1 A bad configuration

The following configuration will fail because localhost occurs twice as a receiver. Duplicate receivers within the same link are not allowed. Use edverify to detect these kind of errors.

A correct way to configure this is senario is:

Now you can send data to all links and all receivers using: edq -n dirname

COUNTER EXAMPLE2 Another bad configuration

The following configuration will fail because host zulu is the same as localhost. In this case all content will be sent to /backup twice. Once for localhost and once for zulu instances.

FILES

$ED_BASE_DIR/etc/eddist.cfg

This is the configuration file

$ED_BASE_DIR/etc/edpasswd

This is the password file. See edpasswd manual for more info

$ED_BASE_DIR/etc/myaliases

This file has one hostname or IP per line. This file lists other IPs or hostnames by which this host is also known. They are synonymous with "localhost" and local IP.

SEE ALSO

edintro(8) edresume(8) edpause(8) edjob(8) edstat(8) eddist.cfg(5) edfsmonitor.cfg(5) edscheduler.cfg(5)

ENVIRONMENT

SUPPORT

For more information contact <support@enduradata.com>

AUTHOR

A. A. El Haddi, elhaddi@ieee.org A. Taouil ataouil@enduradata.com S. Dimitri, sdimitri@enduradata.com