#4. A spark of the summer sun

Topics: architecture, Fugue, Pandas, team, Apache Spark, story, streaming.


Creating Pandas and Spark Compatible Functions with Fugue — Kevin Kho @ Towards Data Science.

Everybody knows about Apache Arrow, which aims to create an effective in-memory storage format for the interaction of different libraries/frameworks. But what if there were such universal format not for storage, but for functions? There is one: Fugue.

level:advanced topic:fugue topic:pandas topic:spark


Keystone Real-time Stream Processing Platform — Netflix Technology Blog.

A high-level overview of Netflix design principles and approaches.

level:medium topic:architecture topic:streaming


Using Distributed Computing for Neuroimaging — Dr. Alessandro Crimi @ Towards Data Science.

Did you ever think that data engineering may literally save lives? Well, it turns out it can. Real-world example.

level:medium topic:spark


Uber’s Data Journey 100+PB with Minute Latency — Reza Shiftehfar @ Uber.

In this talk, Uber reflects on their journey with scaling Data Infrastructure from 1PB to 10PB to 100PB and beyond while reducing latency from 24 hours to 3h to 1h to 10 minutes. There a lot of details about architecture evolution and instruments. And one important thing, at what point should you think about building a Data Platform.

level:medium topic:architecture topic:story type:video


How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh — Zhamak Dehghani @ Martin Fowler Blog.

Did haven’t you still read it? Data mesh is a popular topic now. Read how to build processes, commands, and storage so you don’t have bottlenecks anywhere (maybe).

level:medium topic:architecture topic:team


Written on June 4, 2021