#13. Snake menace

Topics: Architecture, Python, Data Vault, Apache Spark, storage engine

Best practices for caching in Spark SQL — David Vrba @ Towards Data Science Blog.

Great and understandable article about caching with under the hood explanations and examples. Hope now using the cache will be much easier!

Why Python is Slow: Looking Under the Hood — Jake VanderPlas @ Pythonic Perambulations Blog.

Indeed, why? Great article by the multiple python books author.

Python Performance Tuning: 20 Simple Tips — Kevin Cunningham @ Stackify Blog.

What do we love more than performance tuning? Performance tuning in the most performant language!

Optimizing Distributed Joins: The Case of Google Cloud Spanner and DataStax Astra DB — Artem Chebotko @ DataStax Blog.

Shuffle join, broadcast join, co-located join, pre-computed join, etc. which one is better? See how Google Cloud Spanner and DataStax Astra DB optimize distributed joins.

How to Build a Modern Data Platform Utilizing Data Vault — Brian Arnold @ phData Blog.

A brief overview of the Data Vault approach to model your data warehouse structure. The author focuses on which place the Data Vault takes in the overall data platform architecture.

Thanks to https://t.me/rockyourdata community.

Written on October 8, 2021