#26. Pandora's box
Topics: Architecture, Amazon Athena, AWS, data modeling, Docker, Git, Apache Kafka, storage, storage engine
Incremental Cooperative Rebalancing in Apache Kafka: Why Stop the World When You Can Change It? — Konstantine Karantasis @ Confluent Blog.
Kafka rebalancing again. What is the difference between eager and incremental cooperative rebalancing protocol? What are the problems which had to implement a new one? How is it works on high-level abstraction?
Additionally, you can go through KIP to understand the problem even deeper.
Git’s database internals — Derrick Stolee @ GitHub Blog.
- Packed object store
- Commit history queries
- File history queries
- Distributed synchronization
- Scalability
5 part series to look at Git’s internals from the perspective of a database.
Improve federated queries with predicate pushdown in Amazon Athena — AWS Big Data Blog
Let’s talk about query optimization in Athena, especially about predicate pushdown under different databases.
Mussel — Airbnb’s Key-Value Store for Derived Data — The Airbnb Tech Blog
It’s AirBnb time to make their own database. Meet persistent, high availability and low latency key-value storage engine for accessing derived data from offline and streaming events.
Behind the Hype: Should you ever build a Data Vault in a Lakehouse? — Simon Whiteley @ Advancing Analytics Channel
The most expressive talk about Data Vault. And yes, we definitely like Simon :)