#6. Pipeline vibes
Topics: Dagster, data quality, Apache Kafka, late arriving data, pipelines.
Data Quality Meetup #4 — Datafold Team.
The following topics are discussed:
- The Best Data Quality Investment in 2021
- Verity: Data Quality as a Service
- Unit Tests for SQL with dbt & Python
- Building Security Conscious Data Apps
- Fake It Till You Make It: A Backward Approach to Data Products
Thanks to @SimonOsipov for this topic.
Incrementally Adopting Dagster at Mapbox — Ben Pleasanton @ Dagster Blog.
We all know that Dagster may be (and maybe already is) the Next Big Thing after Airflow. But how many success stories did you read? Yes, adoption grows, but something impossible with Airflow? Well, folks from Mapbox hit the wall of complexity with Airflow, and they choose Dagster to cope with this complexity.
Understanding Kafka Topic Partitions — Dunith Dhanushka @ Event-driven Utopia.
Our editor Ksenia is a personal fan of Dunith Dhanushka articles. Many schemes, easy to read, practical use!
29 article pages about out-of-order data arrival. This topic will be burned into your mind. If no, read the references, too.
Data Fusion: A Code-Free Pipeline for Your Google Cloud Data Warehouse — Guillaume Dupont @ Towards Data Science.
Everybody loves to get results without writing a code. The most popular solution for building pipelines without code is Apache NiFi. But what if one can build the whole warehouse with all processes inside without code? Looks like Google Data Fusion may be used for it!