#22. Day of Knowledge is a digest day

Topics: Architecture, consistency, culture, databases, data quality, security, storage engine

The Seattle Report on Database Research (2022) — Murat Blog

A little bit about what database’s world is coming to.

level:medium topic:databases

A Zero-Rename Committer: Object-storage as a Destination for Apache Hadoop and Spark — Steve Loughran, Ryan Blue, Sanjay Radia, Thomas Demoor

Have you heard that S3 didn’t deliver the safe and performant operations which the file committers expect? This paper has very deep details on how Spark jobs safely use it as a destination for their work. Additionally, you will learn how exactly was performance and correctness reached in S3A.

level:advanced topic:storage-engine type:whitepaper

Data Mesh / Data Product Security Pattern — Eric Broda

Good intro article to data security. It’s more about the structurization of concepts.

level:medium topic:architecture topic:security

Best practices to optimize cost and performance for AWS Glue streaming ETL jobs — Gonzalo Herreros @ AWS Big Data Blog

Just tips on how not to be getting ripped off…

level:medium topic:aws topic:glue

Building Spark Lineage For Data Lakes — Corey Fritz @ Monte Carlo Blog

How to automate Spark lineage with real-world examples.

level:medium topic:data-lineage


Written on September 1, 2022