#11. Back to school with Andy Pavlo

Topics: Courses, databases, data mesh, Apache Hive, Apache Kafka, practices, Apache Spark


Courses: Intro to Database Systems (Fall 2019), Advanced Database Systems (Spring 2020) — Andy Pavlo.

Intro to Database Systems

This is a very fundamental and wide course about the implementation and design concepts behind database management systems.

Advanced Database Systems

This is continuous of the first course and will cover more advanced topics about databases such as multi-version concurrency control, detailed talk about join algorithms, and query optimization implementation.

These courses require a lot of time and discipline from the listeners, but you will be rewarded with very useful information about databases. And also, Andy Pavlo is a little bit strange, but a very impressive guy and very love databases :)

PS: Here is a more than 1000 pages book which Andy recommended.

level:medium topic:databases type:course type:video


On Spark, Hive, and Small Files: An In-Depth Look at Spark Partitioning Strategies — Airbnb Engineering Blog.

Millions of small files in HDFS or almost all you want to know about Spark partitioning. At least for a start :)

level:medium topic:hive topic:spark


Data engineering failure — Why is it almost impossible to meet deadlines? — Christophe Blefari.

Do you meet deadlines? Nice article about expectations and problems to meet deadlines including data engineering specific. You can find some probably obvious but useful solutions through reading this.

level:medium topic:practices


Understanding Kafka partition assignment strategies and how to write your own custom assignor — Florian Hussonnois @ StreamThoughts.

Do you like Kafka like we do? Continuation of the article from Digest 7. The author goes deeper into the work of consumer groups and the assignment of partitions.

level:medium topic:kafka


Data Mesh Principle — Zhamak Dehghani.

This is the next article of Data Mesh idea from founder Zhamak Dehghani. A deeper look at processes and organization around the concept. And we are waiting for Data Mesh book at the end of the year. You can find the early edition on the O’Reilly platform.

level:medium topic:data-mesh


This month two great free conferences are taking place


Written on September 10, 2021