#6. Pipeline vibes

Topics: Dagster, data quality, Apache Kafka, late arriving data, pipelines.


Data Quality Meetup #4 — Datafold Team.

The following topics are discussed:

  • The Best Data Quality Investment in 2021
  • Verity: Data Quality as a Service
  • Unit Tests for SQL with dbt & Python
  • Building Security Conscious Data Apps
  • Fake It Till You Make It: A Backward Approach to Data Products

Thanks to @SimonOsipov for this topic.

level:medium topic:data-quality type:video


Incrementally Adopting Dagster at Mapbox — Ben Pleasanton @ Dagster Blog.

We all know that Dagster may be (and maybe already is) the Next Big Thing after Airflow. But how many success stories did you read? Yes, adoption grows, but something impossible with Airflow? Well, folks from Mapbox hit the wall of complexity with Airflow, and they choose Dagster to cope with this complexity.

level:medium topic:dagster topic:pipelines


Understanding Kafka Topic Partitions — Dunith Dhanushka @ Event-driven Utopia.

Our editor Ksenia is a personal fan of Dunith Dhanushka articles. Many schemes, easy to read, practical use!

level:beginner topic:kafka


An optimistic approach to handle out-of-order events within analytical stream processing — JetBrains.

29 article pages about out-of-order data arrival. This topic will be burned into your mind. If no, read the references, too.

level:advanced topic:late-arriving-data type:whitepaper


Data Fusion: A Code-Free Pipeline for Your Google Cloud Data Warehouse — Guillaume Dupont @ Towards Data Science.

Everybody loves to get results without writing a code. The most popular solution for building pipelines without code is Apache NiFi. But what if one can build the whole warehouse with all processes inside without code? Looks like Google Data Fusion may be used for it!

level:medium topic:pipelines


Written on July 2, 2021