#7. Late arriving digest
Topics: Architecture, AWS, Delta Lake, AWS Glue, Apache Hudi, Apache Iceberg, Apache Kafka, pipelines, storage.
Apache Kafka Rebalance Protocol, or the magic behind your streams applications — Florian Hussonnois @ StreamThoughts.
If you would like to know a little bit deeper about the foundation of the Apache Kafka consumption mechanism, this is an excellent article for this. Personally, I didn’t understand clearly tons of “rebalancing” log entries till reading this article.
How we deal with Data Quality using Circuit Breakers — Sandeep Uttamchandani @ Wrong AI.
This is an interesting way to use circuit breaker patterns inside the pipeline processes to prevent low-quality data from propagating to downstream processes.
Hudi, Iceberg and Delta Lake: Data Lake Table Formats Compared — Oz Katz @ lakeFS blog.
Non-immutable formats are the new trend of data storage. Hudi, Iceberg, Delta Lake… which suits your needs better? Check out this article by @lakeFS and choose wisely!
Design a data mesh architecture using AWS Lake Formation and AWS Glue — AWS Big Data Blog.
What is a data mesh? How to implement it? An AWS implementation of data architecture for data mesh.
4 Layers of a Modern Data Pipeline — Thomas Kane.
It’s a short, but comprehensive post, which describes the author’s take on how modern pipelines are built and what they consist of (AWS stack and Open Source alternatives).