SF DAMA Home Upcoming MeetingsLatest Newsletter / Meeting DetailsCDMP Join Or Renew Membership SF DAMA Board Board Responsibilities Job Postings Past Presentations Prior Newsletters Links To Related Sites DAMA International Code Of Conduct Special NoticesContact Us

Meeting Details - 2015

February 10, 2015

Title: Hadoop Data Lake Controversy: Can You Have Your Lake And Use It Too?

Abstract:

Hadoop provides an ideal platform for storing many types of data that business users - data engineers, data scientists, data analysts, and business analysts - can leverage for data science and analytics. But Hadoop is a file system that lacks the automation to catalog what data it contains, and has no native way for users to find and understand the data they need for their data science and analytics projects. The lack of automation is overlooked when a team conducts a pilot since the data set is known; however, it becomes debilitating as projects grow beyond a proof point or two. The end result is data anarchy where the business has to scavenge for data and hoard what it can find, while IT is desperately trying to manage the data to meet the needs of the business.

Using data in Hadoop is like scavenging at a flea market.  It is impossible to know upfront what data is there and it would take too much time to browse through the entire market. In the case of Hadoop, it is not practical to browse through all the files in the cluster to find the right ones to wrangle or visualize.

The opposite of shopping at a flea market is Amazon.com. From a user perspective, it is easy to search and find the right product very quickly. A user doesn’t need to write code or browse through endless list of items. Amazon.com provides a catalog of products with detailed information that anyone can use.

Waterline Data solves the challenges of finding, understanding, and governing data in Hadoop. Waterline Data is like Amazon.com for Hadoop data. Waterline helps anyone find and understand data in Hadoop without writing code or wasting time browsing through unintelligible files.  In addition to providing the self-service experience to find and understand the right data, Waterline Data also automates building and maintaining a data inventory, securely provisions data to users, and enables data governance throughout.

Return

Presenter:  Alex Gorelik, Founder and CEO, Waterline Data

Alex created Waterline Data to accelerate the adoption of Big Data and data driven decision-making at enterprises.

Prior to Waterline Data, Alex served as general manager of Informatica’s Data Quality Business Unit, driving marketing, product management and R&D.  Also for Informatica, Alex managed a team of 400 engineers and product managers as SVP of R&D for Core Technology, developing Informatica’s platform and data integration technology.  

Alex joined Informatica from IBM, where he was an IBM Distinguished Engineer for the Information Integration team. IBM acquired Alex's second startup, Exeros that specialized in enterprise data discovery.

Previously, Alex was co-founder, CTO and VP of Engineering at Acta Technology (acquired by Business Objects and now marketed as SAP Business Objects Data Services).

Prior to founding Acta, Alex managed development of Replication Server at Sybase and worked on Sybase’s strategy for enterprise application integration (EAI). Earlier, he developed the database kernel for Amdahl’s Design Automation group.

Alex holds a B.S. in Computer Science from Columbia University School of Engineering and a M.S. in Computer Science from Stanford University.

Return

(c) Copyright 2010, SF-DAMA