#22. Day of Knowledge is a digest day
Topics: Architecture, consistency, culture, databases, data quality, security, storage engine
The Seattle Report on Database Research (2022) — Murat Blog
A little bit about what database’s world is coming to.
A Zero-Rename Committer: Object-storage as a Destination for Apache Hadoop and Spark — Steve Loughran, Ryan Blue, Sanjay Radia, Thomas Demoor
Have you heard that S3 didn’t deliver the safe and performant operations which the file committers expect? This paper has very deep details on how Spark jobs safely use it as a destination for their work. Additionally, you will learn how exactly was performance and correctness reached in S3A.
Data Mesh / Data Product Security Pattern — Eric Broda
Good intro article to data security. It’s more about the structurization of concepts.
Best practices to optimize cost and performance for AWS Glue streaming ETL jobs — Gonzalo Herreros @ AWS Big Data Blog
Just tips on how not to be getting ripped off…
Building Spark Lineage For Data Lakes — Corey Fritz @ Monte Carlo Blog
How to automate Spark lineage with real-world examples.
Conferences
- Data + AI Summit 2022 North America — Virtual - All Sessions
306 videos. We can’t imagine how many hours of content are.