#11. Back to school with Andy Pavlo
Topics: Courses, databases, data mesh, Apache Hive, Apache Kafka, practices, Apache Spark
Intro to Database Systems
This is a very fundamental and wide course about the implementation and design concepts behind database management systems.
Advanced Database Systems
This is continuous of the first course and will cover more advanced topics about databases such as multi-version concurrency control, detailed talk about join algorithms, and query optimization implementation.
These courses require a lot of time and discipline from the listeners, but you will be rewarded with very useful information about databases. And also, Andy Pavlo is a little bit strange, but a very impressive guy and very love databases :)
PS: Here is a more than 1000 pages book which Andy recommended.
On Spark, Hive, and Small Files: An In-Depth Look at Spark Partitioning Strategies — Airbnb Engineering Blog.
Millions of small files in HDFS or almost all you want to know about Spark partitioning. At least for a start :)
Data engineering failure — Why is it almost impossible to meet deadlines? — Christophe Blefari.
Do you meet deadlines? Nice article about expectations and problems to meet deadlines including data engineering specific. You can find some probably obvious but useful solutions through reading this.
Understanding Kafka partition assignment strategies and how to write your own custom assignor — Florian Hussonnois @ StreamThoughts.
Do you like Kafka like we do? Continuation of the article from Digest 7. The author goes deeper into the work of consumer groups and the assignment of partitions.
Data Mesh Principle — Zhamak Dehghani.
This is the next article of Data Mesh idea from founder Zhamak Dehghani. A deeper look at processes and organization around the concept. And we are waiting for Data Mesh book at the end of the year. You can find the early edition on the O’Reilly platform.