#27. New Year with a New Database
Topics: Apache Airflow, benchmark, databases, data thoughts, GCP, Spanner, storage, testing
Databases in 2022: A Year in Review — Andy Pavlo @ OtterTune blog
Interesting review by Andy Pavlo about databases’ state by the end of 2022. Andy touched on database companies’ funding situation, blockchain, new database systems which getting popularity in 2022, and a few other topics.
Cost Efficiency @ Scale in Big Data File Format — Uber Engineering blog
If you need to choose compression type for parquet files in you data lake, this article is good starting point.
A deep dive into Spanner’s query optimizer — Campbell Fraser, Vlad Lifliand @ Google Cloud Blog
A good introduction to how the query optimizer works on a simple example. In this article, you will find what types of optimizations are used in Spanner, what optimizer statistics are collected, and how to deal with different optimizer versions.
Micropipelines: A Microservice Approach for DAG Authoring in Apache Airflow — Vikram Koka @ Astronomer Blog
No more monolithic pipelines. Please put your hands together for micropipelines!
Learn to Efficiently Test ETL Pipelines — Jacqueline Bilston
This is an absolutely amazing presentation about data pipelines unit testing. Hadn’t seen any resource that was so specific about data pipeline testing so far.