#25. Back to basics

Topics: Apache Arrow, culture, data lineage, data warehouse, Apache Kafka

From Eager to Smarter in Apache Kafka Consumer Rebalances — Sophie Blee-Goldman @ Confluent Blog.

We continue the Kafka topic. New rebalancing protocol from consumer client perspective.

level:medium topic:kafka

Open Sourcing Amundsen: A Data Discovery And Metadata Platform — Lyft Engineering Blog

Great overview and how-to guide about building a data discovery platform. Additionally, I’ve never heard about Databuilder framework.

level:medium topic:data-lineage

Write Better Commits, Build Better Projects — Github Engineering Blog

The article on how to rethink writing commits. At first glance, it may seem that investing time in writing meaningful commits wastes time. But everyone can remember being stuck trying to catch a feature or a bug. In the commits like “fix bug A” or “add feature X” without adding any context. Authors offer to be more meaningful and add understanding with the help of commits, thereby helping the reviewer and developers.

level:medium topic:culture

BigQuery vs Synapse — who is currently winning the battle? — Datasparq Technology Blog

Even if you aren’t interested in BigQuery and Synapse, it’s worth watching for great meme about infrastructure costs :)

level:medium topic:data-warehouse

Introducing the Gandiva Initiative for Apache Arrow — Dremio Blog

Let’s break from our articles about Data Mesh and Data Lakehouses and go back to 2018. To Arrow and Gandiva.

level:medium topic:arrow


  • Smart Data 2022 on October 17-18 (online) and 29th (offline + online), 2022.

    SmartData is a data engineering conference. It’s full of tech talks related to all the topics that are important for a data engineer, ranging from reliability to MLOps. The conference is held by JUG Ru Group.
    Do not miss the Community Day. Get the free ticket for the last day (18th) of Online conference.

  • Trino Summit on November 10th, 2022.

    Trino Summit is an event that convenes engineers, analysts, and superfans of the Trino project. This year, Trino Summit will be hosted as a hybrid event at the Commonwealth Club in San Francisco, CA on November 10th. For those who may not have heard, Trino is the new moniker for the project formerly known as PrestoSQL. We are thrilled to pull together the Trino community again to share our experiences, meet with professionals in the big data and analytics space, and grow your skills and learn about new possibilities.

Written on October 17, 2022