#4. A spark of the summer sun
Topics: architecture, Fugue, Pandas, team, Apache Spark, story, streaming.
Creating Pandas and Spark Compatible Functions with Fugue — Kevin Kho @ Towards Data Science.
Everybody knows about Apache Arrow, which aims to create an effective in-memory storage format for the interaction of different libraries/frameworks. But what if there were such universal format not for storage, but for functions? There is one: Fugue.
Keystone Real-time Stream Processing Platform — Netflix Technology Blog.
A high-level overview of Netflix design principles and approaches.
Using Distributed Computing for Neuroimaging — Dr. Alessandro Crimi @ Towards Data Science.
Did you ever think that data engineering may literally save lives? Well, it turns out it can. Real-world example.
Uber’s Data Journey 100+PB with Minute Latency — Reza Shiftehfar @ Uber.
In this talk, Uber reflects on their journey with scaling Data Infrastructure from 1PB to 10PB to 100PB and beyond while reducing latency from 24 hours to 3h to 1h to 10 minutes. There a lot of details about architecture evolution and instruments. And one important thing, at what point should you think about building a Data Platform.
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh — Zhamak Dehghani @ Martin Fowler Blog.
Did haven’t you still read it? Data mesh is a popular topic now. Read how to build processes, commands, and storage so you don’t have bottlenecks anywhere (maybe).