#2. May the Force be with you
Topics: cost management, data quality, ETL, Apache Flink, Apache Iceberg, Apache Kafka, Kubernetes, streaming.
Byte Down: Making Netflix’s Data Infrastructure Cost-Effective — Netflix Technology Blog.
Some useful ideas of categorizing your infrastructure costs and how to keep it under control.
Root Cause Analysis for Data Engineers — Towards Data Science @ Medium.
How to perform data downtime analysis: where should we look, and a quite controversial idea on “in which order” we should look at potential causes.
Netflix Data Mesh: Composable Data Processing — Justin Cunningham @ Flink Forward.
Bird’s eye Netflix data processing architecture overview. The interesting part is how they work with schema changes separately from the data.
Running Apache Flink on Kubernetes — Empathy.co Blog @ Medium.
In the world where k8s won the race we’re trying to run everything on it. Here is the recipe of running Flink on Kubernetes.
Kafka Resiliency — Retry/Delay Topic, Dead Letter Queue (DLQ) — Sheshnath Kumar @ Medium.
Three typical architectures for resilient message handling in Kafka. If you have Kafka source in your data pipelines, it can be interesting.