Our History With Data Replication Software
Our history with file replication software started when Minnesota was the world leader in supercomputers.
A few words from our founder …
How did we get here?
It all started with major disaster recovery and data loss
The road to my interest in file replication software started a few months after starting a new position. I was a senior scientist and a data manager of the LTER project at the University of Minnesota. The LTER project was funded by the National Science Foundation.
The roof of my office collapsed as a result of a pipe freezing on the fourth-floor stock room on a Christmas eve. The University police called my house and gave me the bad news. I was devastated.
I spent the night and Christmas day moving boxes of disks and soaked servers to drier spots. And few of them there were.
The data losses were staggering. To recover, I had to work 7 days a week for many many months along with my then interns Dr. Erach E., Mr. Ronald Loi, Mr. Kartick Shredar, and Dr. Trung Nguyen (now at IBM Watson) and other work-study students and graduate students.
We were able to restore by re-entering data from hard copies and re-reading backups from thousands of 360KB, 1.2MB, 720Kb, and 1.44 MB floppy disks (The losses could have been worse if not for the technology of the time; gone are the days!).
The University of Minnesota was self-insured and we were on the hook to do with what we had. The Reagan administration had cut part of the funding from The National Science Foundation, which we relied upon, and the cuts made matters worse. But the data was very valuable from a research perspective and the impact of what the research will have on future discussions around global climate change. However, the real impetus for my cross-platform computing software was born during my work as a graduate student in computer science and as a computer specialist at The National Oceanic and Atmospheric Administration (NOAA.)
Floods, MIRR & Need For Cross-Platform Data Transfer
In the early 90s, I went back to grad school (again) for computer science and I was also working for NOAA. I was laboring on an algorithm for spatial data partitioning under the guidance of Dr. Shashi Shekhar, Dr. Tom Caroll and Dr. Y. Saad. My task was to use nuclear detectors technology on board of an aircraft to detect natural radiation, map it, extrapolate it, and come up with an estimation of water content. This water content will be used to predict floods and which zone will be evacuated. It was put to the test months later where countless lives and property were saved. Read on!
The idea was to spread spatial and temporal computations over a large number of distributed nodes (local and remote) and MIRR was born. I started by using PVM to distribute data between all nodes. I was trying to reduce the time it took to estimate soil moisture and flood risks across the United States and Canada by combining airborne operations survey data with satellite data, digital elevation models data, and much more. We had billions of data points from flights and satellite passes for a great deal of latitude and longitude points (spatial). We also had a large volume of historical data (the temporal factor).
The first distributed GIS software application (MIRR) was born and we reduced our kriging times from several days to 2 to 3 hours. This was a great tool that saved many lives and property by alerting communities in the path of the devastating floods.
Credits for the sparks!
Discussions with Dr. Shekhar, Dr. Y. Saad, Dr. Bill Kenny (a veteran of high-performance computing in the defense and space programs; Univac, Control Data, General Dynamics), Randy Hills (NOAA and former Army Corp engineer), Captain Barry Choy (My dear friend and my great low flying aircraft test pilot for my Airborne Gamma Detection and Analysis System(AGDAS) guided my initial interests. Thanks to CAPT. Choy for giving me confidence not be scared during the high-risk flights(My pilot and my leader; Yes my captain!), Commander Poston, Commander Maxson (Maximum Bob), Ann McManamon (My dear friend She Beast), the late Milan Alan (A remote sensing analyst; NOAA and …) and R. Hartman (By Boss and Our collective Captain Picard ). These mentors, friends started to direct me into some interesting and fascinating areas in applied distributed computing and real-time data acquisition.
With more nodes came more operating systems and more communications overhead?
I used a master/slave model where the master was in charge of distributing the load between all nodes (a sort of load balancer). The time frame was before the birth of virtual machines. I started by running on AIX workstations, then I got my hand on a farm of HP 735s that run HP-UX (the snakes we called them then!), so I ported to HP-UX. The challenge was in finding binary builds of PVM, at the time, for a new farm of free nodes I discovered at a participating lab. This time it was OS/2. I decided to rewrite the parts of the PVM that I needed and removed PVM from the equation. Cray 2 and some other obscure MIPS boxes entered into the equation as well. An early version of Linux Slackware distribution along with the Halloween edition of RedHat was in the mix as well. I was hungry for nodes(feed me Seymour!) so I kept looking for where I can attach my MIRR like a parasite.
The need for data migration and data transfer
At the time, I was not aware of XML2 because it was not there. But in my early years as a scientist at the University of Minnesota, I wrote some code to migrate data from a DOS database called KMAN (Knowledge Manager) to Informix. Working with an undergraduate student and a graduate student from computer science (Mr. Kartick Shredar; now at Microsoft and Dr. Trung Nguyen; now at IBM Watson lab), we came up with a parser and a data format that allowed us to convert data from one format to another. A workshop sponsored by NSF at the National Center for Super Computer Application (NCSA; led then by Larry Smarr and where mosaic, the first browser, came from) exposed me further to hierarchical data representations.
We called CDR LTER format HDR (hierarchical data representation). A plugin allowed us to reduce the number of formats we needed. So instead of writing n x n format conversions, we wrote only n+1. HDR served as a middle-ware that allowed us to convert to anything (Since we had as many formats as we had grad students, post-docs, faculty, and research partners). We learned to deal with little and big Indian data representations along the way. Using this knowledge, I learned to pass data between machines and software systems. I had the encouragement and the freedom to make mistakes thanks to a star professor and researcher (The great American ecologist and mind: Dr. David Tilman (A professor at both the University of Minnesota and the University of California, Santa Barbara; Also a Foreign elected member of British Royal Society and prior to that a recipient of the BBVA Foundation Frontiers of Knowledge Award), who was my boss at the University of Minnesota and who had the biggest impact on my life and chances of success. I called him Akdeem out of respect: Means Chief of all chiefs; His quick mind forced me to try and follow suit.).
EH Socket library
Back to MIRR: I ended up writing my socket library in C. It allowed me to transfer vectors and matrices of both integer and double values without the need to use PVM. Years later, Dr. Saad from the University of Minnesota introduced me to MPM and sparse matrices so I learned a little bit more about compression and reducing communication overhead. Anyway, I was able to port my socket library to AIX, HP-UX, OS2, CRAY2 UNICOS, OpenBSD, Linux, Solaris (the Halloween version from RedHat), SunOS, and even DOS. I was not fortunate to use GCC and a unified Makefile as we do today. I had to use multiple compilers from IBM, Sun, HP, GCC, and Dj (Delorie). The road was painful, but I learned to write code once and rebuild each time using scripts, batch files, and Makefiles (not what Scott McNeally meant by building once and running many).
The skills I learned, came in handy when I went to work for EJV/Now owned by Reuters and we had to move tons of data several hours a day between our sites and our clients’ (banks and financial institutions). EJV had SLA agreements with customers who needed to receive their data at specific times so they could take action.
I remember being up every night at 3 am because the network was unreliable then. I get up at three and “force the jobs to rerun on autosys” (And the bonuses[Incentives] of Wall Street were a part of the three legged stool: Control, Incentives, and decision rights). So the customers will have their data on time. While waiting for the files to transfer from our main site in St Louis, I made myself a pot of French Roast coffee and a loaf of wheat, oatmeal and barley bread to share with my engineering friends and co-workers when I got to my 2nd work (:-) shift at 8:30 am. It took me a few months to fix the problems and to start getting some much-needed sleep. Every day I kept saying, FTP, rdist and everything like this has to go away.
Gone are the early days of network unreliability but friendships and skills learned to remain …Will continue refining both to deliver better solutions to complex problems and to satisfy my intellectual curiosity.
The space shuttle to the rescue
In 2002 I co-founded ConstantData with three partners. We produced the first real-time replication software for Linux. We later added windows and Solaris. We helped NASA keep an eye in real-time on the discovery space shuttle after the shuttle program was grounded soon after the unfortunate Columbia explosion on reentry. Later we helped AOL backup user profiles, Bloomberg synchronize data between sites, and a few other defense contractors deliver data wherever it needed to go. We sold the company to Backbone in 2005 and went to work for Backbone for a year before I discovered the plans to outsource our jobs to India :-).
So I decided to go teach, mentor, and work with graduate students again.
A new cross platform data communication was born
In 2006, I ventured into peer to peer file sharing after Backbone sent my Job to India. Working with several of my graduate students, we wrote TAMDA to share files (a precursor to Dropbox and friends). I failed to commercialize it. We went back to the drawing board and started coding 24 hours a day to be able to move large amounts of data between meshes of servers located in Toronto, Casablanca, France, Minnesota, Seattle, Washington DC, London/UK, Hyderabad/India, and Hong Kong.
Some of the sites were chosen because of government snooping. Others were chosen because of unreliable networks. Others because of the bandwidth throttling that many carriers resorted to. We had a great testbed for our project. And we had a great time as geeks!
Our experiments resulted in seven graduate theses and a mountain of data and knowledge. In 2008, after helping Quantum with the design of their replication and building a prototype to replicate the StoreNext File System, I decided to pivot from file sharing to a solution that will deliver automatic secure file sharing, remote online backup, and cross-platform file transfer and replication. EDWADDS was born in 2008 (EnduraData Wide Area Data Distribution; My wife hated the name but I did not know why!). In 2013, we renamed EDWADDS to EDpCloud ( After I finished the MBA and learned one tiny thing about brands).
Today EDpCloud supports Windows XP, Windows 2003, Windows 7, Windows 2008, Mac, Linux, and Solaris x86, Solaris Sparc, Mac, AIX, and other UNIX flavors and runs in some very demanding environments such as government agencies, healthcare payers, providers as well as global clinical research organizations.
We got here because of many things: Airplanes, Satellites, and Floods, Space shuttles, mixed Operating systems, natural disasters, war zones, and nature …But above all, we got here because of the help I received from many bright people and the feedback of many great system administrators. For that, I am very grateful to all of them and to the countless other stars and mentors I did not mention, I say thank you. I am more than humbled.
This was a small part of my story, other paths and paths of others in the team were omitted because this entry is getting too long.
Time for a cup of coffee and a new feature to solve a new problem or save a life.
It has been a wild and interesting experience. Thanks to all my mentors, managers, pilots, and friends. I am so grateful for all of your wisdom.
I have to run write some code and help a customer. That is what I enjoy doing. It does not work for me. It is my life and it is very fulfilling.
A. A. El Haddi
Eden Prairie, MN
Jan 29th, 2014
Updates (Aug 2019)
Today, EnduraData file replication software is used to move critical healthcare data (Scans, Xrays, provider notes, insurance information, ….), clinical research data, government data, auto manufacturing, and other life-critical information.
Share this Post