#10. Slack time

Topics: Apache Airflow, architecture, data lake, data lineage, data quality, Apache Hudi, storage engine.


Data Lineage at Slack — Samuel Bock @ Slack Engineering.

Slack team’s perspective on the problem of Data Lineage. The article describes the architecture of their own solution.

level:medium topic:architecture topic:data-lineage


DAG Writing Best Practices in Apache Airflow — Astronomer Guides

We all love best practices. Especially short. Especially with examples.

level:medium topic:airflow


Apache Hudi - The Data Lake Platform — Apache Hudi Blog

Look at an article with a detailed description of Apache Hudi! What is Hudi? Storage architecture. Indexes. Concurrency. Caches.

level:medium topic:data-lake topic:hudi


What is Cost-based Optimization? — Alexey Goncharuk @ Querify Labs Blog

Response to the popular question: what are the mythical units of the query plan cost?

level:medium topic:storage-engine


How Uber Achieves Operational Excellence in the Data Quality Experience — Uber Engineering Blog

What base principles lie at the heart of the Uber data quality platform? As always, detailed and understandable article by Uber engineers.

level:medium topic:data-quality


Written on August 27, 2021