A special edition on Hadoop | January2016

| | January 20169CIOReviewmanages the traditional enterprise data warehouse, which has now transformed into ever growing Hadoop data lakes or reservoirs. There are obvious benefits to having all of your data in a single place managed by one team however they fre-quently cannot scale to meet the pace of change required by multiple stakeholders.In contrast, decentralized teams were formed as a result of more narrow busi-ness unit driven projects or objectives- closer to their internal customers, demand-ing agility that central teams are typically unable to achieve. These more narrowly scoped projects re-sult in quick turn time to achieving business in-sight, but at the expense of a well-architected, broader data environment that can achieve greater scale across a larger set of internal customers. While the insight gained from these "fit for purpose" data marts is significant, they create a host of other "data" issues that are magnified when multiplied by an ever-increasing number of data marts. A few of these issues include:· Data latency data takes an ever increas-ing number of hops and transformations before it resides in a place where it can deliver business value · Data veracity with fit for purpose data marts we gained agility in unlocking new insights but have lost the confidence in the data as these metrics are not consistent across the marts· Data quality and Data lineage a web of data marts, some-times pulling data from each other, with multiple hops and transformations frequently loses data nuisances and creates a change manage-ment nightmare-ultimately impacting data quality · Replication of data at times in large quantities, causing uncontrolled cloud and internal infrastructure costs · Data governance inability to create or enforce any type of data governance or compliance framework The answer to these seemingly con-flicting issues relies on an architectural vi-sion and framework that relies on several key tenants:· Centralize (most) of your data man-agement Embrace the data lake/reser-voir architecture. Extract and load your data into a central Hadoop platform where most of the truly heavy data management lifting should be performed, done once and the same way for all to share ensuring the highest data quality, especially for the core or critical business metrics. · Enable a semantic layer, where busi-ness users can create their own data views without building yet another data mart. · Enable your business users with a da-ta-blending tool, for out of the box data management work, while still keeping them in a more defined and governed environment. · Data governance by design En-able the concepts of data catalogs, certi-fied data and data lineage. This must be built into your core workflows to ensure usage but also add value beyond pure data governance. The true value is access to highly consumable data assets and a com-pute platform that enables faster time to insight. · Keep your analytics and data science closer to the businesses they serve. This is where the true magic comes to life when properly enabled with consumable, clean and certified data. There is a high degree of organization-al buy-in and collaboration that accom-panies this sort of transformation, and it begins with a shared vision across the key stakeholders who understand the obvious tradeoffs each approach brings. A sound reusable data management framework should serve as the focal point to bring-ing scale to the insights your businesses demand. The hyper pace of technology and data science innovation combined with the necessary organizational change will certainly make for an interesting and fun journey. Data takes an ever increasing number of hops and transformations before it resides in a place where it can deliver business value
< Page 8 | Page 10 >