| | March 20159CIOReviewabout eight years ago, and I remember when learning to break boards my instructor always encouraged us to focus our punches six-inches past the board. This is how a board can be broken, focus your energy at a point beyond the surface. In approaching the cloud, I have always asked, "What can't I put in the cloud?" Rather than focus on the incremental step of what to move from a data center to the cloud, focus on the provocative concept of putting everything in the cloud and consider the exception only of what has to stay behind. This leads to the critical thinking of how to operate and integrate fully within a cloud context rather than finding yourself caught in a compromise hybrid mode that is neither cost-effective nor agile. Focus on what seem six-inches past the impossible and smash through the wall of your data center into the cloud. Now if clouds weren't buzzy enough, how about big data? Let me cover three concepts here: anti-patterns, the biggest of the three V's, and talent. Some of the most useful lessons I've learned in the big data space are what not to do. A simple first anti-pattern is the habit of bringing data to an analytics engine. We are familiar with data warehouses and building data cubes that we can dice, slice, and mine to our heart's content. However, with the scale of data involved in today's analytics challenges, the appropriate pattern is to bring our analytics to the data and run in place. Even given a distributed data model, many analytic tasks are loosely coupled or embarrassingly parallel, enabling distributed analysis. So to correct the anti-pattern of bringing your data to the analysis, apply the pattern of bringing your analysis to the data.The second anti-pattern I've found is that of data myopia - the strong belief that simply more of a given class of data will provide more analytic depth or predictive value. In fact, introducing different classes of data allows far greater insight and predictive value with a fraction of the data volume. A great example comes from drug discovery where scientist seek to find biomarkers that predicts a health outcome. By combining multiple data types such as imaging, genomics, proteomics, and clinical data, scientist have been able to define health outcomes with greater fidelity and data economy as compared to biomarkers based upon only a single class of data. The third anti-pattern, and this one is a doozy, is,"build it and they will come." Sometimes also referred to as the, "can't you just build me a search engine that will find what I want?" sy ndr om e . There is a t e m p t a t i o n in thinking that if I just bring enough data together, I'll be ready to answer any question. This concept is that of hypothesis-free big data, or emergent analytics. I am a big fan of hypothesis -driven big data solutions. Or put a different way: start first with the questions you seek to ask of any big data solution, this will then inform the architecture, data sets, algorithms, and analytics. In particular one can then proceed iteratively; a question yields an answer which leads to another question which leads to another answer. Each of these iterations informs an increasing and organic set of data and methods bootstrapping between successful demonstrations of value. The waterfall, Big Bang approach to building the Delphic oracle is an unfortunate big data anti-pattern that seems to rear its head far too often. Instead, start with a clear set of focused questions, and chart your big data journey with small and iterative steps.Now, onto the Big V. Big data is sometimes described as the 3V's: volume, velocity, and variety. And in life-sciences we mostly see volume and variety, and although the volume is challenging with imaging, genomic, and real world evidence data that are very large, one John Reynders
<
Page 8 |
Page 10 >