#26. Pandora's box

Topics: Architecture, Amazon Athena, AWS, data modeling, Docker, Git, Apache Kafka, storage, storage engine


Incremental Cooperative Rebalancing in Apache Kafka: Why Stop the World When You Can Change It? — Konstantine Karantasis @ Confluent Blog.

Kafka rebalancing again. What is the difference between eager and incremental cooperative rebalancing protocol? What are the problems which had to implement a new one? How is it works on high-level abstraction?
Additionally, you can go through KIP to understand the problem even deeper.

level:medium topic:kafka


Git’s database internals — Derrick Stolee @ GitHub Blog.

  1. Packed object store
  2. Commit history queries
  3. File history queries
  4. Distributed synchronization
  5. Scalability

5 part series to look at Git’s internals from the perspective of a database.

level:advanced topic:architecture topic:git topic:storage


Improve federated queries with predicate pushdown in Amazon Athena — AWS Big Data Blog

Let’s talk about query optimization in Athena, especially about predicate pushdown under different databases.

level:medium topic:athena topic:aws topic:docker topic:storage-engine


Mussel — Airbnb’s Key-Value Store for Derived Data — The Airbnb Tech Blog

It’s AirBnb time to make their own database. Meet persistent, high availability and low latency key-value storage engine for accessing derived data from offline and streaming events.

level:medium topic:storage-engine


Behind the Hype: Should you ever build a Data Vault in a Lakehouse? — Simon Whiteley @ Advancing Analytics Channel

The most expressive talk about Data Vault. And yes, we definitely like Simon :)

level:medium topic:data-modeling type:video


Written on November 4, 2022