#8. 4 things and 5 more reasons

Topics: Architecture, data lake, data quality, data warehouse, Apache Iceberg, Apache Kafka, Apache Pulsar, tooling.


4 Things You Need to Know When Solving for Data Quality — Barr Moses @ Towards Data Science Blog.

Barr Moses is a professional in making data reliable. There are key 4 points that will give you back a healthy pragmatism. Definitely recommend to read.

level:medium topic:data-quality


Data Lake vs. Data Warehouse — Subsurface Blog.

If you’re new in data engineering, this article will provide an opportunity to synchronize terminology and understanding of key concepts and their differences. Without further ado, just the essence.

level:beginner topic:architecture topic:data-lake topic:data-warehouse


Migrating to Apache Iceberg at Adobe Experience Platform — Romin Parekh, Shone Sadler, and Miao Wang @ Adobe Tech Blog.

Great practical article about migration from HDFS to Iceberg. Motivation, different migration strategies, and their choice, lessons learned.

level:medium topic:iceberg


5 More Reasons to Choose Apache Pulsar Over Apache Kafka — Chris Bartholomew @ DataStax Blog.

Do you like platform comparisons? I’ve read almost nothing about Apache Pulsar. But after the point about tiered storage in Pulsar, when you can move old messages to cheaper storage to store it always, I was inspired.

level:medium topic:kafka topic:pulsar


Create Cloud Architecture with Diagrams for AWS, Azure, and GPC — Bruce H. Cottman, Ph.D. @ Towards Data Science.

Sometimes we all need to project our cloud reality to paper. Did you know that you can do it with text?
Also, you may take a look at the Kroki tool.

level:beginner topic:architecture topic:tooling


Written on July 30, 2021