#2. May the Force be with you

Topics: cost management, data quality, ETL, Apache Flink, Apache Iceberg, Apache Kafka, Kubernetes, streaming.

Byte Down: Making Netflix’s Data Infrastructure Cost-Effective — Netflix Technology Blog.

Some useful ideas of categorizing your infrastructure costs and how to keep it under control.

level:medium topic:cost-management

Root Cause Analysis for Data Engineers — Towards Data Science @ Medium.

How to perform data downtime analysis: where should we look, and a quite controversial idea on “in which order” we should look at potential causes.

level:advanced topic:data-quality topic:etl

Netflix Data Mesh: Composable Data Processing — Justin Cunningham @ Flink Forward.

Bird’s eye Netflix data processing architecture overview. The interesting part is how they work with schema changes separately from the data.

level:beginner topic:iceberg type:video

Running Apache Flink on Kubernetes — Empathy.co Blog @ Medium.

In the world where k8s won the race we’re trying to run everything on it. Here is the recipe of running Flink on Kubernetes.

level:advanced topic:flink topic:kubernetes topic:streaming

Kafka Resiliency — Retry/Delay Topic, Dead Letter Queue (DLQ) — Sheshnath Kumar @ Medium.

Three typical architectures for resilient message handling in Kafka. If you have Kafka source in your data pipelines, it can be interesting.

level:medium topic:kafka topic:streaming

Written on May 4, 2021