#13. Snake menace

Topics: Architecture, Python, Data Vault, Apache Spark, storage engine


Best practices for caching in Spark SQL — David Vrba @ Towards Data Science Blog.

Great and understandable article about caching with under the hood explanations and examples. Hope now using the cache will be much easier!

level:medium topic:spark


Why Python is Slow: Looking Under the Hood — Jake VanderPlas @ Pythonic Perambulations Blog.

Indeed, why? Great article by the multiple python books author.

level:medium topic:python


Python Performance Tuning: 20 Simple Tips — Kevin Cunningham @ Stackify Blog.

What do we love more than performance tuning? Performance tuning in the most performant language!

level:medium topic:python


Optimizing Distributed Joins: The Case of Google Cloud Spanner and DataStax Astra DB — Artem Chebotko @ DataStax Blog.

Shuffle join, broadcast join, co-located join, pre-computed join, etc. which one is better? See how Google Cloud Spanner and DataStax Astra DB optimize distributed joins.

level:medium topic:storage-engine


How to Build a Modern Data Platform Utilizing Data Vault — Brian Arnold @ phData Blog.

A brief overview of the Data Vault approach to model your data warehouse structure. The author focuses on which place the Data Vault takes in the overall data platform architecture.

Thanks to https://t.me/rockyourdata community.

level:medium topic:architecture topic:data-vault


Written on October 8, 2021