#21. Read me in any order
Topics: Architecture, consistency, culture, databases, data quality, education, storage engine
5 Books that Make You a Better Data Engineer — Steve Russo
You know, we really like these books!
Data Quality Monitoring is dead. Say Hello to Full Data Stack Observability — Salma Bakouk
New terms keep appearing in the world of data engineering. Today we’re talking about Data Observability.
Introduction to the Join Ordering Problem — Alexey Goncharuk @ Querify Labs Blog
Query optimization details from Querify Labs. With lots of pictures!
Machine Learning Operations (MLOps): Overview, Definition, and Architecture — Dominik Kreuzberger, Niklas Kühl, Sebastian Hirschl
This article comes up quite often in Twitter reposts and is definitely worth your attention. It does a very good job of systematising your knowledge of what MLOps are about and it’s also easy enough to read.
Uber’s Highly Scalable and Distributed Shuffle as a Service — Mayank Bansal, Bo Yang, Mayur Bhosale, Kai Jiang @ Uber Engineering Blog
This article is worth reading in order to dive into a new level of understanding of data shuffle in Spark.