#12. Become an open-source star
Topics: Apache Airflow, architecture, community, late arriving data, storage engine, streaming
Why you should try something else than Airflow for data pipeline orchestration — Mehdi Ouazza.
Now the article’s name speaks for itself!
How Airbnb Enables Consistent Data Consumption at Scale, Part III: Building a coherent consumption experience — Airbnb Blog.
Airbnb metrics series are coming back! Here we can find out how and why they built API layer over metrics and dimensions. And how it was integrated into BI and reporting tools.
Previously on:
Part I, Part II
Streaming 101: The world beyond batch: Part I, Part II — Tyler Akidau.
Here are two fundamental articles that help you go deeper into streaming theory and understand the key difference between batch and stream processing in terms of time. Help you collect the right questions that you should ask yourself when you work with watermarks, triggers, and windows. These articles have changed my understanding forever, and I hope they will change yours.
Success at Apache: from Mentee to PMC — Ephraim Anierobi @ Apache Blog.
Maybe it’s time to become an open-source star! Inspire and start to commit to Apache products ;)
Distributed Query Engines vs. Data Lake Engines — Patrick Pichler.
What’s the difference between Presto, Dremio, Impala, and DBLink at its core. How it can affect the architecture.
Upcoming conferences
- Open Source Data Stack Conference, September 28-30