#9. Goddess blessing

Topics: Architecture, benchmark, Docker, Apache Hadoop, Apache Spark, tooling.

Containerizing Apache Hadoop Infrastructure at Uber — Uber Engineering.

Detailed technical article about re-architecting Uber’s Hadoop deployment stack for 21000+ hosts. Cluster management, docker containers, configs, and many more.

Higher-Order Functions with Spark 3.1 — David Vrba @ Towards Data Science Blog.

New functions for manipulating with arrays have been released. Check maybe you can forget about UDF.

Data Movement in Netflix Studio via Data Mesh — Netflix Technology Blog.

New interesting article about Netflix platform. How to combine not quite that Data Mesh, CDC, schema evolution, data enrichment and data quality? Read the article.

How Airbnb Achieved Metric Consistency at Scale Part I, Part II — The Airbnb Tech Blog.

Do you have data marts in your DWH? How to build a clear single source of truth for business metrics? How to make them resistant to changes? How to validate and orchestrate data marts jobs? Let’s read about Minerva — Airbnb’s metric platform that covers the full life cycle of a metric.

Database-like ops benchmark — H2O.ai.

Benchmark of several databases on simple operations on various amounts of data.

Thanks to @SimonOsipov for this topic.

Written on August 13, 2021