A little history
Dealing with data was not a hobby of mine. It was how I fed and raised my children, supported my 42 dependents and satisfied my love for science and discovery.
Almost 30 years ago and counting, I was hired by Dr. David Tilman for my first full time job in the US, as a scientist and data manager at the University of Minnesota.
The tools of the trade then were, Kman (Or knowledge manager), FORTRAN, PASCAL, Stat Graphics, SigmaStat, Sigmaplot and APL!). Later C, SAS, S, S+, Informix , 4GL, ESQL/C, Jump, etc entered the equation. Our database grew exponentially since we had to track thousands of variables and thousands of species, weather, soil, etc. We transformed, analyzed, reduced data and explored data.
Research, experimental design and data management
Our research was financed by the National Science Foundation(http://nsf.gov), the Department of Energy, the Mellon foundation and a few others concerned about America’s future. It was at the height of the cold war … We faced exactly the same problems that businesses are facing today. What raw data to keep, what to summarize? We had experimental data, observation data and modeling data. Modeling data was used, summarized and not kept. We run out of storage quickly if we attempted to save it. Therefore, It was regenerated and reproduced using a fixed or a dynamic random seed. But even as we summarized the data, we continued to look for patterns. Our scripts, models and SAS applications running on many platforms were continuously searching for patterns.
Data analysis and mining
I will not forget the Saturday afternoon in the spring of 1990 when I finally hit the jackpot about the effect of climate variation and drought on species loss and biodiversity. Some may call it a fishing expedition. I knew what I was looking for but I did not know how to get there. Or did I really know? That is the beauty of science. I managed to find that link only because someone at the state climatology office (Gregg Spoden and Jim Zandlo) had the vision of collecting and curating raw data going back to the mid 1800s and they shared that valuable data with me. And because I spent a great deal of time learning statistics and Databases and I questioned assumptions and more … and of course there was luck 🙂
Some of the oldest data for our project came from Fort Snelling. Other data sets were from Cedar and Cambridge, MN, USA. Combining these data sets with long-term experimental data from Cedar Creek showed us undeniable patterns about nature and human activity.(Find out more)
Later Dave told me that he took a big gamble on me when he hired me. See, I was an assistant professor in Morocco and immigrated to the US. I rose from the dust and lady liberty welcomed me. I landed at MSP with one way ticket and $1.54 in my pocket. I am glad Dave took that big gamble on me and he was very glad he took that gamble as well. It turned out to be a great symbiosis. I learned a lot from this star.
Data analytics and business
Fast forward, today, all hardware, cloud and analytic vendors are trying to stress the effect big data will have on business and people (Except the poor, the hungry and the disconnected from the grid! but again they may come in contact with a new way of being part of some sort of big data collectors, reservoirs or processors).
Many companies are in the early phases of creating data reservoirs that they hope to use in the near future to predict their consumers’ behaviors. Let no one be misled that ultimate goal is predictive behavior and the desired outcome is to get the consumer to take an action on behalf of one or more entities? This is more than predictive analytics. This is predictive analytics with a remote guidance system. Someone somewhere will use location services, behavior patterns, consumption patterns, income patterns, health patterns, ecosystem interaction patterns and long term characterization and profiling of the individual and of his ecosystem to guide him/her through a well defined set of paths, a series of choices to finally take actions to benefit the entities I mentioned earlier. Some talking heads utter a paradigm shift. Others peddle their wares and goods about little and big data. Thomas Kuhn will shrug if he hears these talking heads … In non scientific terms and thought, we are witnessing more than a paradigm shift. We are witnessing une bousculade technologique! But Kuhn will quickly reconcile it (again) with the gradualist models if not with many others …
In all cases, you will need to move data from all its sources to your sinks, reservoirs, warehouses or analytic clusters before you process it, mine it, visualize it and explore it. Weather you are using Hadoop, MapR, SAS, R, Oracle or other tools you need to bring your structured, semi structured or unstructured data to one or more locations, automatically, securely and efficiently.
What do you think?
A. El Haddi
EP, MN, USA
Other posts of interest
- File replication software for data migration
- Combining file transfer with file archiving
- Linux Real Time Bidirectional File Replication
- Including and excluding files from data synchronization and online backup
- File Sync software combined with extract transform and load without FTP
- EDpCloud File Synchronization and File Replication Software Installation and Configuration
- File sync: Keeping multiple geographic sites in sync automatically using server replication
- Replication of NFS mounted directory with or without root squashing.