Using EDpCloud To Synchronize Data In Real Time To A Linux Virtual Machine Instance On Google’s Cloud.
Houssam El Haddi (Institut National Des Statistiques et de l’Economie Appliquee(INSEA))
A. A. El Haddi (EnduraData, MN)
In this tutorial we will show you how to synchronize data using EDpCloud file synchronization software( https://www.enduradata.com/download) to a Virtual Machine (VM) instance running Debian Linux on Google Cloud, (see more https://console.developers.google.com/project and http://en.wikipedia.org/wiki/Google_Storage).
This scenario is useful if you want to copy and synchronize data from many regions to a central location or to migrate data between clouds or cloud regions as well. Although this write up focuses on using a Linux Google VM instance, you can also use Microsoft Azure, Amazon Windows or Linux instances or instances in your own private cloud(running Linux(Intel, Power7), Mac, Windows, Solaris Sparc, Solaris x86 or AIX).
With EDpCloud you can backup data to the public, private or hybrid cloud.
For a brief summary on what EDpCloud does see the video clip at the bottom of this post.
This article is for system administrators and technical staff concerned with data protection or data migration to the cloud.
For this example, we will assume that:
- You already have a VM instance on Google cloud. Its FQN is gcloud100.google.com. Its IP is 188.8.131.52. We will refer to this instance, interchangeably, as a remote, a destination, a target, a data sink or a data reservoir.
- Your server or laptop on premise is svr.example.com. Its IP is 184.108.40.206. We will refer to this as a local server, or as a source server.
- You will install EDpCloud data synchronization software under /usr/local/enduradata on both 220.127.116.11 and 18.104.22.168
And you are on the way to migrating your data or to start protecting it.
First, configure your Google cloud instance’s firewall
Allow tcp ports for EDpCloud (If you did not change the defaults, allow ports 8888 and 9000). You can change these ports in enduradata_env under the bin directory.
While at it you will need to configure ssh as well (see https://www.debian-administration.org/article/530/SSH_with_authentication_key_instead_of_password)
Consult this resource for more information about google instances:
Copy and install EDpCloud on your instance
- Download edpcloud x64 for Linux.
- scp edpcloud*.gz email@example.com:/tmp
- login to gcloud100.google.com
Install EDpCloud data synchronization software
- cd /tmp
- tar xvf edpcloud*gz
- cd /tmp/enduradata_edpcloud/
- Accept the license and answer a few questions (The defaults are reasonable and should work)
Add your VM’s external IP to EDpCloud’s myaliases file
myaliases is a handy configuration file that allows you to deal with a common issue of name resolution with most cloud prodviders where the external IP is not returned by ifconfig.
Add the external IP’s of the instance to myaliases. One per line. Make sure that there are no extra spaces in any entries.
insert a line and add 22.214.171.124
Save and exit.
Create your data synchronization configuration file /usr/local/enduradata/edpcloud/etc/eddist.cfg
Here is the content of eddist.cfg
< ?xml version=”1.0″ encoding=”UTF-8″?>
<config name=”enduradata” >
<link name=”gcloud1″ passwd=”foox”/>
<receiver hostname=”126.96.36.199″ storepath=”/googlestore1″> </receiver>
On 188.8.131.52 (Your Google instance): mkdir /googlestore1 if it does not exist (You may need to provision additional storage and create googlestore1 directory first).
- Copy eddist.cfg to /usr/local/enduradata/edpcloud/etc/eddist.cfg on both 184.108.40.206 and svr.example.com
- Install your EDpCloud file synchronization software license on both 220.127.116.11 and 18.104.22.168: copy the license to /usr/local/enduradata/edpcloud/etc/edlicense
Configure the real time file replication updates
EDpCloud comes with a real time module for Linux and Windows. The module watches one or more designated directories for changes and queues the changes for synchronization to the remote location.
Add all the directories that you want to replicate in real time to /usr/local/enduradata/edpcloud/etc/edfsmonitor.cfg
If needed you can also create a schedule to replicate and synchronize data from 22.214.171.124 to the google cloud server 126.96.36.199 at discrete intervals of time.
Example of a real time configuration
Start file synchronization software services
Restart the services on both the source (svr.example.com) and the cloud server destination (188.8.131.52):
. /usr/local/enduradata/edpcloud/bin/edpcloud.sh restartall
Make sure that all works: Test using edstat on the source first.
Send a test file from the source to make sure all works:
- touch /home/foo/testfile.txt
- edq -l gcloud1 -n /home/foo/testfile.txt
If all is well you should see your file on 184.108.40.206:/googlestore1/home/foo/testfile.txt
If you experience any problems, the culprits may be:
- The firewall has blocked your EDpCloud ports
- myaliases file is missing or incorrect
- an error in eddist.cfg
Examine the logs under /usr/local/enduradata/edpcloud/logs
Perform initial synchronization
- edq -l gcloud1 -n /home
edq -l gcloud1 -n /data
After the initial synchronization is done, the real time file system monitor will synchronize any files that change under /home, /data and /var/log
Happy data migration and happy computing on the remote site as well because the data will be readable (it is not in a proprietary format).
Left as an exercise for you:
You have a few journalists in Africa, Europe, Australia. You can have all of them file their stories and send their content in real time to a Google VM. The content can be accessed by your other partners. How would you do that? Easy!
It is free to try:
1. Go to google and create your instance
2. Go to enduradata.com and download a copy of the software