#24. Jack of all trades
Topics: Architecture, data governance, data modeling, GCP, Apache Iceberg, Apache Kafka, MLOps, Apache Spark
Data governance building blocks on Google Cloud for financial services — Daryus Medora, Oscar Pulido @ Google Cloud blog
This article contains a list of services on GCP plus open source and third party technologies you may want to use as a building blocks for your data government full cycle. Additionally, it contains a basic architecture for data government implementation. If you work with GCP and in the very beginning of your data journey, I recommend you to take a look on this article to get a good overview of the tech stack.
A 2020 Reader’s Guide to The Data Warehouse Toolkit — Cedric Chin @ Holistics Blog
Some books are out of time. Kimball is one of them. But like "War and Peace" it needs some kind of runbook on how to read it.
2 talks from Apache Kafka® Meetup
Here are 2 talks from Apache Kafka® Meetup. The first one is The Silver Bullet for Endless Rebalances from Confluent. This one is highly useful to understand the theory behind rebalancing protocol and trade-offs of different implementations. Even if it’s not useful for you now it is very interesting for wide auditory from a theoretical perspective. The second one is Kafka as a service at Dropbox. Engineers from Dropbox share their experiences and problems how to provide Kafka as a service at scale.
Upgrading Data Warehouse Infrastructure at Airbnb — Ronnie Zhu @ The Airbnb Tech Blog
Airbnb decided to refresh Spark and start using Iceberg. In this article, we will see motivation, case studies, and tuning experience.
Shopify’s Playbook for Scaling Machine Learning — Solmaz Shahalizadeh @ Shopify Engineering Blog
If you’re at the start of building ML infrastructure and launching models it’s worth reading. Good structured points about what to focus on in each period of maturity.
Books
- Snowflake: The Definitive Guide
A new book on Architecting, Designing, and Deploying on the Snowflake Data Cloud.