The road to my interest in file replication software started when I was a scientist and a data manager at the University of Minnesota. The roof of my office collapsed as a result of a pipe freezing in the fourth floor stock room. The data losses were staggering.
We were able to restore by reentering data from hard copies and re-reading backups from thousands of 360KB, 1.2MB, 720 and 1.44 MB floppy disks (The losses could have been worse if not for the technology of the time; gone are the days!).
The University of Minnesota was self-insured and we were on the hook to do with what we had. The Reagan administration had cut part of the funding from The National Science Foundation, which we relied upon, and the cuts made matters worse. But the data was very valuable from a research perspective and the impact of what the research will have on future discussions around global climate change. However, the real impetus for my cross platform computing software was born during my work as a graduate student in computer science and as a computer specialist at The National Oceanic and Atmospheric Administration (NOAA.)
Floods, MIRR & Need For Cross Platform Data Transfer
In the early 90s, I was in grad school for computer science and working for NOAA, I was laboring on an algorithm for spatial data partitioning under the guidance of Dr. Shashi Shekhar, Dr. Tom Caroll and Dr Y. Saad.
The idea was to spread spatial and temporal computations over a large number of distributed nodes (local and remote) and MIRR was born. I started by using PVM to distribute data between all nodes. I was trying to reduce the time it took to estimate soil moisture and flood risks across the United States and Canada by combining airborne operations survey data with satellite data, digital elevation models data and much more. We had billions of data points from flights and satellite passes for a great deal of latitude and longitude points (spatial). We also had a large volume of historical data (the temporal factor). Our first distributed GIS software application (MIRR) was born and we reduced our kriging times from several days to 2 to 3 hours. This was a great tool that saved lives by alerting communities in the path of the devastating floods.
Credits for the sparks!
Discussions with Dr. Y. Saad, Dr. Bill Kenny (a veteran of high performance computing in the defense and space programs; Univac, Control Data, General Dynamics), Randy Hills (NOAA and former Army Corp engineer), Captain Barry Choy (A great low flying aircraft test pilot for my AGDAS system and a high risk flights pilot and leader; Yes my captain!), Commander Poston, Commander Maxson, Ann McManamon (She.B), the late Milan Alan (A remote sensing analyst; NOAA) and R. Hartman (Capt. Picard ) started to direct me into some interesting and fascinating areas.
With more nodes came more operating systems and more communications overhead?
I used a master/slave model where the master was in charge of distributing the load between all nodes (a sort of load balancer). The time frame was before the birth of virtual machines. I started by running on AIX workstations, then I got my hand on a farm of HP 735s that run HP-UX (the snakes we called them then!), so I ported to HP-UX. The challenge was in finding binary builds of PVM, at the time, for a new farm of free nodes I discovered at a participating lab. This time it was OS/2. I decided to rewrite the parts of the PVM that I needed and removed PVM from the equation. Cray 2 and some other obscure MIPS boxes entered into the equation as well. I was hungry for nodes(feed me Seymour!) so I kept looking for where I can attach my MIRR like a parasite.
The need for data migration and data transfer
At the time, I was not aware of XML2 because it was not there. But in my early years as a scientist at the University of Minnesota, I wrote some code to migrate data from a DOS database called KMAN (Knowledge Manager) to Informix. Working with an under graduate student and a graduate student from computer science (Mr. Kartick Shredar; now at Microsoft and Dr. Trung Nguyen; now at IBM Watson lab), we came up with a parser and a data format that allowed us to convert data from one format to another.
We called the format HDR (hierarchical data representation). A plugin allowed us to reduce the number of formats we needed. So instead of writing n x n format conversions, we wrote only n+1. HDR served as a middle-ware that allowed us to convert to anything (Since we had as many formats as we had grad students, post-docs and faculty). We learned to deal with little and big Indian data representation along the way. Using this knowledge, I learned to pass data between machines and software systems. I had the encouragement and the freedom to make mistakes thanks to a star professor and researcher (The great American ecologist and mind: Dr. David Tilman who was my boss at the U of M and who had the biggest impact on my life and chances of success and of course I called him Akdeem out of respect: Means Chief of all chiefs; His quick mind forced me to try and follow suit.).
EH Socket library
Back to MIRR: I ended up writing my socket library that allowed me to transfer vectors and matrices of both integer and double values without the need to use PVM. Years later, Dr. Saad from the University of Minnesota introduced me to MPM and sparse matrices so I learned a little bit more about compression and reducing communication overhead. Anyway, I was able to port my socket library to AIX, HP-UX, OS2, CRAY2 UNICOS, OpenBSD, Linux, Solaris (the Halloween version from RedHat), SunOS and even DOS. I was not fortunate to use gcc and a unified Makefile as I use today. I had to use multiple compilers from IBM, Sun, HP, gcc and Dj (Delorie). The road was painful, but I learned to write code once and rebuild each time using scripts, batch files and Makefiles (not what Scott McNeally meant by building once and running many).
The skills I learned, came in handy when I went to work for EJV/Now owned by Reuters and we had to move tons of data several hours a day between our sites and our clients (banks and financial institutions). EJV had SLA agreements with customers who needed to receive their data at specific times so they could take action.
I remember being up every night at 3 am, because the network was unreliable then. I get up at three and “force the jobs to rerun on autosys” (And the bonuses[Incentives] of Wall street were a part of the three legged stool: Control, Incentives, and decision rights). So the customers will have their data on time. While waiting for the files to transfer from our main site in St Louis, I made myself a pot of French Roast coffee and a loaf of wheat, oat meal and barley bread to share with my friends and co-workers when I got to my 2nd work (:-) shift at 8:30am. It took me a few months to fix the problems and start getting some much needed sleep.
Gone are the early days of network unreliability but friendships and skills learned remain …Will continue refining both to deliver better solutions to complex problems and to satisfy my intellectual curiosity.
The space shuttle to the rescue
In 2002 I co-founded ConstantData with three partners. We produced the first real time replication software for Linux. We later added windows and Solaris. We helped NASA keep an eye in real time on the Discovery space shuttle after the shuttle program was grounded soon after the unfortunate Columbia explosion on reentry. Later we helped AOL backup user profiles, Bloomberg synchronize data between sites, and a few other defense contractors deliver data. We sold the company to Backbone in 2005.
A new cross platform data communication was born
In 2006, I ventured into peer to peer file sharing after Backbone sent my Job to India. Working with several of my graduate students, we wrote TAMDA to share files (a precursor to Dropbox and friends). We went back to the drawing board and started coding 24 hours a day to be able to move large amounts of data between meshes of servers located in Toronto, Casablanca, France, Minnesota, Seattle, Washington DC, London/UK, Hyderabad/India and Hong Kong.
Some of the sites were chosen because of government snooping. Others were chosen because of the unreliable networks. Others because of the bandwidth throttling that many carriers resorted to. We had a great test bed for our project. And we had a great time as geeks!
Our experiments resulted in seven graduate theses and a mountain of data and knowledge. In 2008, after helping Quantum with the design of their replication and building a prototype to replicate StoreNext File System, I decided to pivot from file sharing to a solution that will deliver secure file sharing, remote online backup and cross platform file replication. EDWADDS was born in 2008 (EnduraData Wide Area Data Distribution; My wife hated the name but I did not know why!). In 2013, EDWADDS was renamed to EDpCloud.
Today EDpCloud supports Windows XP, Windows 2003, Windows 7, Windows 2008, Mac, Linux, and Solaris x86, Solaris Sparc, Mac and other UNIX flavors.
So we got here because of many things: Airplanes, Satellites, and Floods, Space shuttles, mixed Operating systems, natural disasters and nature …But above all, we got here because of the help I received from many bright people. For that, I am very grateful to all of them and to the countless other stars and mentors I did not mention, I say thank you. I am more than humbled.
This was a small part of my story, other paths and paths of others in the team were omitted because this entry is getting too long.
Time for a cup of coffee and a new feature to solve a new problem.
A. A. El Haddi
Eden Prairie, MN
Jan 29th 2014
- Combining file transfer with file archiving
- Linux Real Time Bidirectional File Replication
- Including and excluding files from data synchronization and online backup
- Archiving files on a remote site when file content changes
- File Sync software combined with extract transform and load without FTP