Rate this book

PRACTICAL ENTERPRISE DATA LAKE INSIGHTS: HANDLE DATA-DRIVEN CHALLENGES IN AN ENTERPRISE BIG DATA LAKE [Paperback] Gupta

Name: PRACTICAL ENTERPRISE DATA LAKE INSIGHTS: HANDLE DATA-DRIVEN CHALLENGES IN AN ENTERPRISE BIG DATA LAKE [Paperback] Gupta
Rating: 4.06 (5 reviews)
ISBN: 9781484246061

Gupta

Rate this book

Chapter 1: Data Lake Concepts Overview
Chapter This chapter highlights key concepts of Data Lake and Tech Stack. It briefs the readers on the background of Data Management, the need to have a Data Lake, and focus on latest running trends.No of 20Sub - Familiarization with Enterprise Data Lake ecosystem2. Understand key components of Data Lake3. Data understanding - Structured vs Unstructured
Chapter 2: Data Replication Strategies
Chapter The chapter will focus on how to replicate data into Hadoop from source systems. Depending on the nature of source systems, strategies may change. The chapter will start with a talk trivial approaches to ETL data into Hadoop and then dive into the latest trends of change data capture.No of 25Sub - 1. Conventional ETL strategies2. Change data capture for relational data3. Change data capture for time-series data
Chapter - 3: Bring Data into Hadoop
Chapter The chapter will focus on how to get data into a Hadoop cluster. It will talk on several approaches and utilities that can be used to bring data into Hadoop for processing.Page 30Sub - RDBMS to Hadoop2. MPP database systems to Hadoop3. Unstructured data into Hadoop
Chapter 4: Data Streaming Strategies
Chapter The chapter will deep dive into data streaming principles of Kafka. It will talk on how Kafka works and understand how it resolves the challenge of getting data into Data Lake.No of 50Sub - 1. How to stream the data? Kafka2. How to persist the changes3. How to batch the data4. How to massage the data5. Tools and technologies - HVR, Oracle golden gate for big data
Chapter 5: Data Processing in Hadoop
Chapter This chapter will provide an insight into various data querying platforms. It all started with Map Reduce but Hive is quickly acquiring de facto status in the industry. Chapter will deep dive into Hive, its SQL like semantics and show case its most recent capabilities. A dedicated section on Spark will give a detailed walk-through on Spark approach to process data in Hadoop.No of 30Sub - 1. Map reduce2. Query engines - intro/bigdata sql/bigSQL3. Hive - focus4. Spark - focus5. Presto
Chapter 6: Data Security and Compliance
Chapter This chapter will talk on security aspects of a data lake in Hadoop. The fact that security had been deliberately compromised in the past by organizations, does has a weight. The chapter talks about how to build a safety net around data lake and mitigate the risks of unauthorized access or injection attacks on a Data Lake. Page 20Sub - Encryption in-transit and at rest2. Data masking3. Kerberos security and LDAP authentication4. Ranger
Chapter 7: Ensure Availability of a Data Lake
Chapter This chapter throws light on yet another key aspect of data landscape i.e. availability. It will discuss topics like disaster recovery strategies, how to setup replication between two data centers, and how to tackle consistency and integrity of data.Page 20Sub - Disaster Recovery Strategies2. Setup Data cente

Paperback

1 person is currently reading

26 people want to read

About the author

Gupta

289 books3 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

7 (41%)

4 stars

7 (41%)

3 stars

1 (5%)

2 stars

1 (5%)

1 star

1 (5%)

Displaying 1 - 5 of 5 reviews

Jennifer

8 reviews

July 25, 2023

it's a overview focused on hadoop technology, is not 100% practical, have some theory, about 70% of the content and 30% practical use cases.

Martín Galán merchán

15 reviews1 follower

July 26, 2018

A good and prácticos tutoriales about BigData issues.

Adarsh

4 reviews1 follower

August 29, 2021

Matches the title, ready for an Enterprise level description.

Christoph Kappel

463 reviews9 followers

December 13, 2022

For me this book is also a good primer for some of the problems related to data lakes. It covers everything from setup, to roles/security and finally observability.

2022 big-data english