#7. Late arriving digest

Topics: Architecture, AWS, Delta Lake, AWS Glue, Apache Hudi, Apache Iceberg, Apache Kafka, pipelines, storage.


Apache Kafka Rebalance Protocol, or the magic behind your streams applications — Florian Hussonnois @ StreamThoughts.

If you would like to know a little bit deeper about the foundation of the Apache Kafka consumption mechanism, this is an excellent article for this. Personally, I didn’t understand clearly tons of “rebalancing” log entries till reading this article.

level:advanced topic:kafka


How we deal with Data Quality using Circuit Breakers — Sandeep Uttamchandani @ Wrong AI.

This is an interesting way to use circuit breaker patterns inside the pipeline processes to prevent low-quality data from propagating to downstream processes.

level:advanced topic:data-quality topic:pipelines


Hudi, Iceberg and Delta Lake: Data Lake Table Formats Compared — Oz Katz @ lakeFS blog.

Non-immutable formats are the new trend of data storage. Hudi, Iceberg, Delta Lake… which suits your needs better? Check out this article by @lakeFS and choose wisely!

level:medium topic:deltalake topic:hudi topic:iceberg topic:storage


Design a data mesh architecture using AWS Lake Formation and AWS Glue — AWS Big Data Blog.

What is a data mesh? How to implement it? An AWS implementation of data architecture for data mesh.

level:medium topic:architecture topic:aws topic:glue


4 Layers of a Modern Data Pipeline — Thomas Kane.

It’s a short, but comprehensive post, which describes the author’s take on how modern pipelines are built and what they consist of (AWS stack and Open Source alternatives).

level:medium topic:architecture topic:aws


Written on July 19, 2021