0 Item

eBOOK

Engineering Lakehouses with Open Table Formats

Name: Engineering Lakehouses with Open Table Formats
Brand: Packt Publishing
Price: 49.99 AUD
Availability: PreOrder

Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

By: Dipankar Mazumdar, Vinoth Govindarajan

Write A Review

eBook | 12 September 2025

At a Glance

Format
ePUB

eBook

RRP $54.99

$49.99

or 4 interest-free payments of $12.50 with

Available: 12th September 2025

Preorder. Download available after release.

Read on

IOS

Android

Desktop

Windows

eReader

Jumpstart your journey towards mastering open data architectural patterns by learning the fundamentals and applications of open table formats

Key Features

Build open lakehouses with open table formats using popular compute engines such as Apache Spark, Apache Flink, Trino, and Python
Optimize Lakehouse performance with advanced techniques such as pruning, partitioning, compaction, indexing, and clustering
Learn how to enable seamless integration, data management, and interoperability using Apache XTable
Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake. If you are a data engineer or architect looking to understand the intricacies of open lakehouse architectures, this book is for you. You'll start by exploring the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You'll also work with each table format with hands-on exercises using popular computing engines such as Apache Spark, Flink, Trino, dbt, and Python-based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production-ready lakehouses. With step-by-step explanations, you'll get to grips with the key components of Lakehouse architecture and learn how to build, maintain, and optimize them. By the end, you'll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real-world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization's data needs.

What you will learn

Explore Lakehouse fundamentals such as table formats, file formats, compute engines, and catalogs
Gain a complete understanding of data lifecycle management in lakehouses
Integrate lakehouses with Apache Airflow, dbt, and Apache Beam
Optimize performance with sorting, clustering, and indexing techniques
Use the open table formats data with ML frameworks like Spark MLlib, Tensorflow, and MLFlow
Interoperate across different table formats with Apache XTable and UniForm
Secure your lakehouse with access controls and ensure regulatory compliance

Who this book is for

This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake, and learn how they are used to build lakehouses. It is also a good fit for professionals working with traditional data warehouses, relational databases, and data lakes, who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL are recommended for a smooth learning experience.

Read on

IOS

Android

Desktop

Windows

eReader

You Can Find This eBook In

Non-Fiction Computing & I.T.Databases Data Warehousing Graphical & Digital Media Applications 3D Graphics & Modelling Computer Science Computer Architecture & Logic Design Parallel Processing

Engineering Lakehouses with Open Table Formats

Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

At a Glance

eBook

Read on

Key Features

Book Description

What you will learn

Who this book is for

Read on

More...

You Can Find This eBook In

More in Data Warehousing

Amazon Redshift Cookbook

Recipes for building modern data warehousing solutions

Data Contracts in Practice

Master data contracts to boost efficiency, align data understanding, and support data governance

The Definitive Guide to OpenSearch

Discover advanced techniques and best practices for efficient search and analytics with OpenSearch

Engineering Lakehouses with Open Table Formats

Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

Health Information Processing. Evaluation Track Papers

10th China Health Information Processing Conference, CHIP 2024, Fuzhou, China, November 15-17, 2024, Proceedings

Chatbots and Human-Centered AI

8th International Workshop, CONVERSATIONS 2024, Thessaloniki, Greece, December 4-5, 2024, Revised Selected Papers

Building Medallion Architectures

Designing with Delta Lake and Spark

Data Engineering Fundamentals

Building scalable data solutions with ETL pipelines and strategic data architecture design (English Edition)

CockroachDB: The Definitive Guide

Guide: Distributed Data at Scale

Distributed Caching & Data Management

Mastering Redis, Memcached, And Apache Ignite Caching

Data Warehouse Essentials

Mastering the Foundations of Data Management

Time Series Analysis with Spark

A practical guide to processing, modeling, and forecasting time series with Apache Spark

Implementing Analytics Solutions Using Microsoft Fabric—DP-600 Exam Study Guide

Boost your skills with expert insights and certification-ready strategies for Microsoft analytics

Ultimate Snowflake Architecture for Cloud Data Warehousing

Architect, Manage, Secure, and Optimize Your Data Infrastructure Using Snowflake for Actionable Insights and Informed Decisions

Technologies and Applications of Artificial Intelligence

29th International Conference, TAAI 2024, Hsinchu, Taiwan, December 6-7, 2024, Proceedings, Part II

Health Information Processing

10th China Health Information Processing Conference, CHIP 2024, Fuzhou, China, November 15-17, 2024, Proceedings, Part I

Health Information Processing

10th China Health Information Processing Conference, CHIP 2024, Fuzhou, China, November 15-17, 2024, Proceedings, Part II

Technologies and Applications of Artificial Intelligence

29th International Conference, TAAI 2024, Hsinchu, Taiwan, December 6-7, 2024, Proceedings, Part I

From Zero to Oracle Hero

A Journey Through SQL, PL/SQL, and DBA Dark Arts

AI Strategy

Unleash the Power of Artificial Intelligence in Your Business

MDATA Cognitive Model

Theory and Applications

Science of Cyber Security

6th International Conference, SciSec 2024, Copenhagen, Denmark, August 14-16, 2024, Proceedings

SQL Interview Success From Beginner To Pro

Navigating Complexity: Advanced Decision Support Systems for Healthcare Professionals

O7.0 TRANSFORM INFORMATION TECHNOLOGY

This product is categorised by