#27. New Year with a New Database

Topics: Apache Airflow, benchmark, databases, data thoughts, GCP, Spanner, storage, testing


Databases in 2022: A Year in Review — Andy Pavlo @ OtterTune blog

Interesting review by Andy Pavlo about databases’ state by the end of 2022. Andy touched on database companies’ funding situation, blockchain, new database systems which getting popularity in 2022, and a few other topics.

level:beginner topic:databases topic:data-thoughts


Cost Efficiency @ Scale in Big Data File Format — Uber Engineering blog

If you need to choose compression type for parquet files in you data lake, this article is good starting point.

level:beginner topic:benchmark topic:storage


A deep dive into Spanner’s query optimizer — Campbell Fraser, Vlad Lifliand @ Google Cloud Blog

A good introduction to how the query optimizer works on a simple example. In this article, you will find what types of optimizations are used in Spanner, what optimizer statistics are collected, and how to deal with different optimizer versions.

level:beginner topic:gcp topic:spanner


Micropipelines: A Microservice Approach for DAG Authoring in Apache Airflow — Vikram Koka @ Astronomer Blog

No more monolithic pipelines. Please put your hands together for micropipelines!

level:medium topic:airflow


Learn to Efficiently Test ETL Pipelines — Jacqueline Bilston

This is an absolutely amazing presentation about data pipelines unit testing. Hadn’t seen any resource that was so specific about data pipeline testing so far.

level:beginner topic:testing type:video


Written on February 4, 2023