#12. Become an open-source star

Topics: Apache Airflow, architecture, community, late arriving data, storage engine, streaming


Why you should try something else than Airflow for data pipeline orchestration — Mehdi Ouazza.

Now the article’s name speaks for itself!

level:medium topic:airflow


How Airbnb Enables Consistent Data Consumption at Scale, Part III: Building a coherent consumption experience — Airbnb Blog.

Airbnb metrics series are coming back! Here we can find out how and why they built API layer over metrics and dimensions. And how it was integrated into BI and reporting tools.

Previously on:
Part I, Part II

level:medium topic:architecture


Streaming 101: The world beyond batch: Part I, Part II — Tyler Akidau.

Here are two fundamental articles that help you go deeper into streaming theory and understand the key difference between batch and stream processing in terms of time. Help you collect the right questions that you should ask yourself when you work with watermarks, triggers, and windows. These articles have changed my understanding forever, and I hope they will change yours.

level:advanced topic:late-arriving-data topic:streaming


Success at Apache: from Mentee to PMC — Ephraim Anierobi @ Apache Blog.

Maybe it’s time to become an open-source star! Inspire and start to commit to Apache products ;)

level:medium topic:community


Distributed Query Engines vs. Data Lake Engines — Patrick Pichler.

What’s the difference between Presto, Dremio, Impala, and DBLink at its core. How it can affect the architecture.

level:medium topic:storage-engine


Upcoming conferences


Written on September 24, 2021