Hadoop Data Lake Controversy: Can
You Have Your Lake And Use It Too?
Hadoop provides an ideal platform for storing
many types of data that business users - data engineers, data scientists,
data analysts, and business analysts - can leverage for data science and
analytics. But Hadoop is a file system that lacks the automation to
catalog what data it contains, and has no native way for users to find and
understand the data they need for their data science and analytics
projects. The lack of automation is overlooked when a team conducts a
pilot since the data set is known; however, it becomes debilitating as
projects grow beyond a proof point or two. The end result is data anarchy
where the business has to scavenge for data and hoard what it can find,
while IT is desperately trying to manage the data to meet the needs of the
Using data in Hadoop is like scavenging at a
flea market. It is impossible to know upfront what data is there and it
would take too much time to browse through the entire market. In the case
of Hadoop, it is not practical to browse through all the files in the
cluster to find the right ones to wrangle or visualize.
The opposite of shopping at a flea market is
Amazon.com. From a user perspective, it is easy to search and find the
right product very quickly. A user doesn’t need to write code or browse
through endless list of items. Amazon.com provides a catalog of products
with detailed information that anyone can use.
Waterline Data solves the challenges of
finding, understanding, and governing data in Hadoop. Waterline Data is
like Amazon.com for Hadoop data. Waterline helps anyone find and
understand data in Hadoop without writing code or wasting time browsing
through unintelligible files. In addition to providing the
self-service experience to find and understand the right data, Waterline
Data also automates building and maintaining a data inventory, securely
provisions data to users, and enables data governance throughout.
Founder and CEO, Waterline Data
Alex created Waterline Data to accelerate the adoption of Big Data and
data driven decision-making at enterprises.
Prior to Waterline Data, Alex served as general manager of Informatica’s
Data Quality Business Unit, driving marketing, product management and R&D.
Also for Informatica, Alex managed a team of 400 engineers and
product managers as SVP of R&D for Core Technology, developing
Informatica’s platform and data integration technology.
Alex joined Informatica from IBM, where he was an IBM Distinguished
Engineer for the Information Integration team. IBM acquired Alex's second
startup, Exeros that specialized in enterprise data discovery.
Previously, Alex was co-founder, CTO and VP of Engineering at Acta
Technology (acquired by Business Objects and now marketed as SAP Business
Objects Data Services).
Prior to founding Acta, Alex managed development of Replication Server at
Sybase and worked on Sybase’s strategy for enterprise application
integration (EAI). Earlier, he developed the database kernel for Amdahl’s
Design Automation group.
Alex holds a B.S. in Computer Science from Columbia University School of
Engineering and a M.S. in Computer Science from Stanford University.