Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues.
When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through stages that can bring up tough questions such as data processing, data querying, and security. Concepts such as change data capture and data streaming are covered. The book takes an end-to-end solution approach in a data lake environment that includes data security, high availability, data processing, data streaming, and more. Each chapter includes application of a concept, code snippets, and use case demonstrations to provide you with a practical approach. You will learn the concept, scope, application, and starting point.
Saurabh K. Gupta is a technology leader, published author, and data enthusiast with more than a decade of experience in data architecture, engineering, development, and administration. Working as Data & Analytics Manager at GE, he focusses on data lake analytical programs to build digital solutions for business stakeholders. In the past, he has worked extensively with Oracle database design and development, PaaS and IaaS cloud service models, consolidation, and in-memory technologies. Prior to authoring "Practical Enterprise Data Lake Insights" with Apress in 2018, he worked with Packt publishing for "Advanced Oracle PL/SQL Developer's Guide" in 2016, and "Oracle Advanced PL/SQL Developer Professional Guide" in 2012. He is a frequent speaker a conferences organized by the data and analytics user community and technical institutions. He tweets at @saurabhkg and blogs at sbhoracle.wordpress.com.
For me this book is also a good primer for some of the problems related to data lakes. It covers everything from setup, to roles/security and finally observability.