#16. We wish you a Merry Christmas Exactly-Once!
Topics: Apache Airflow, architecture, data lineage, data mesh, Apache Kafka, Apache Spark
A brief history of the metrics store — Nick Handel @ Towards Data Science.
We haven’t learned about feature store yet. But it won’t stop us to read about Metric stores :)
Data Lineage with OpenLineage and Airflow — Astronomer.
Beneficial webinar about how to implement data lineage with Marquez and Airflow. With real examples, not only theory!
Building data platform in PySpark. Part 1. Python and Scala interop — Sergey Ivanychev @ Joom Blog.
Why and how to use Scala in PySpark.
HelloFresh Journey to the Data Mesh — HelloFresh Blog.
To read other people’s stories is interesting because it might look like yours.
Exactly-Once Semantics Are Possible: Here’s How Kafka Does It (Proposal: KIP-98) — Neha Narkhede @ Confluent Blog
This is a very interesting explanation about reaching exactly-once semantics. These are just two simple words, but this is very not trivial. You should keep in mind many different things and consider different failure scenarios. I think this understanding can help you when you design your pipelines.