#26. Pandora's box

Topics: Architecture, Amazon Athena, AWS, data modeling, Docker, Git, Apache Kafka, storage, storage engine

Incremental Cooperative Rebalancing in Apache Kafka: Why Stop the World When You Can Change It? — Konstantine Karantasis @ Confluent Blog.

Kafka rebalancing again. What is the difference between eager and incremental cooperative rebalancing protocol? What are the problems which had to implement a new one? How is it works on high-level abstraction?
Additionally, you can go through KIP to understand the problem even deeper.

Git’s database internals — Derrick Stolee @ GitHub Blog.

5 part series to look at Git’s internals from the perspective of a database.

Improve federated queries with predicate pushdown in Amazon Athena — AWS Big Data Blog

Let’s talk about query optimization in Athena, especially about predicate pushdown under different databases.

Mussel — Airbnb’s Key-Value Store for Derived Data — The Airbnb Tech Blog

It’s AirBnb time to make their own database. Meet persistent, high availability and low latency key-value storage engine for accessing derived data from offline and streaming events.

Behind the Hype: Should you ever build a Data Vault in a Lakehouse? — Simon Whiteley @ Advancing Analytics Channel

The most expressive talk about Data Vault. And yes, we definitely like Simon :)

Written on November 4, 2022