#9. Goddess blessing

Topics: Architecture, benchmark, Docker, Apache Hadoop, Apache Spark, tooling.


Containerizing Apache Hadoop Infrastructure at Uber — Uber Engineering.

Detailed technical article about re-architecting Uber’s Hadoop deployment stack for 21000+ hosts. Cluster management, docker containers, configs, and many more.

level:advanced topic:docker topic:hadoop


Higher-Order Functions with Spark 3.1 — David Vrba @ Towards Data Science Blog.

New functions for manipulating with arrays have been released. Check maybe you can forget about UDF.

level:medium topic:spark


Data Movement in Netflix Studio via Data Mesh — Netflix Technology Blog.

New interesting article about Netflix platform. How to combine not quite that Data Mesh, CDC, schema evolution, data enrichment and data quality? Read the article.

level:medium topic:architecture


How Airbnb Achieved Metric Consistency at Scale Part I, Part II — The Airbnb Tech Blog.

Do you have data marts in your DWH? How to build a clear single source of truth for business metrics? How to make them resistant to changes? How to validate and orchestrate data marts jobs? Let’s read about Minerva — Airbnb’s metric platform that covers the full life cycle of a metric.

level:medium topic:architecture


Database-like ops benchmark — H2O.ai.

Benchmark of several databases on simple operations on various amounts of data.

Thanks to @SimonOsipov for this topic.

level:medium topic:benchmark topic:tooling

Written on August 13, 2021