#6. Pipeline vibes

Topics: Dagster, data quality, Apache Kafka, late arriving data, pipelines.

The following topics are discussed:

The Best Data Quality Investment in 2021
Verity: Data Quality as a Service
Unit Tests for SQL with dbt & Python
Building Security Conscious Data Apps
Fake It Till You Make It: A Backward Approach to Data Products

Thanks to @SimonOsipov for this topic.

Incrementally Adopting Dagster at Mapbox — Ben Pleasanton @ Dagster Blog.

We all know that Dagster may be (and maybe already is) the Next Big Thing after Airflow. But how many success stories did you read? Yes, adoption grows, but something impossible with Airflow? Well, folks from Mapbox hit the wall of complexity with Airflow, and they choose Dagster to cope with this complexity.

Understanding Kafka Topic Partitions — Dunith Dhanushka @ Event-driven Utopia.

Our editor Ksenia is a personal fan of Dunith Dhanushka articles. Many schemes, easy to read, practical use!

An optimistic approach to handle out-of-order events within analytical stream processing — JetBrains.

29 article pages about out-of-order data arrival. This topic will be burned into your mind. If no, read the references, too.

Data Fusion: A Code-Free Pipeline for Your Google Cloud Data Warehouse — Guillaume Dupont @ Towards Data Science.

Everybody loves to get results without writing a code. The most popular solution for building pipelines without code is Apache NiFi. But what if one can build the whole warehouse with all processes inside without code? Looks like Google Data Fusion may be used for it!

Written on July 2, 2021