#21. Read me in any order

Topics: Architecture, consistency, culture, databases, data quality, education, storage engine


5 Books that Make You a Better Data Engineer — Steve Russo

You know, we really like these books!

level:medium topic:education


Data Quality Monitoring is dead. Say Hello to Full Data Stack Observability — Salma Bakouk

New terms keep appearing in the world of data engineering. Today we’re talking about Data Observability.

level:medium topic:data-quality


Introduction to the Join Ordering Problem — Alexey Goncharuk @ Querify Labs Blog

Query optimization details from Querify Labs. With lots of pictures!

level:medium topic:databases topic:storage-engine


Machine Learning Operations (MLOps): Overview, Definition, and Architecture — Dominik Kreuzberger, Niklas Kühl, Sebastian Hirschl

This article comes up quite often in Twitter reposts and is definitely worth your attention. It does a very good job of systematising your knowledge of what MLOps are about and it’s also easy enough to read.

level:medium topic:mlops type:whitepaper


Uber’s Highly Scalable and Distributed Shuffle as a Service — Mayank Bansal, Bo Yang, Mayur Bhosale, Kai Jiang @ Uber Engineering Blog

This article is worth reading in order to dive into a new level of understanding of data shuffle in Spark.

level:advanced topic:spark


Written on July 15, 2022