Articles tagged with storage

Cost Efficiency @ Scale in Big Data File Format — Uber Engineering blog

If you need to choose compression type for parquet files in you data lake, this article is good starting point.

level:beginner topic:benchmark topic:storage


Git’s database internals — Derrick Stolee @ GitHub Blog.

  1. Packed object store
  2. Commit history queries
  3. File history queries
  4. Distributed synchronization
  5. Scalability

5 part series to look at Git’s internals from the perspective of a database.

level:advanced topic:architecture topic:git topic:storage


Cache in distributed systems — AKT @ Medium

A detailed and easy-to-understand guide to caches for 14 min read.

level:beginner topic:storage


Hudi, Iceberg and Delta Lake: Data Lake Table Formats Compared — Oz Katz @ lakeFS blog.

Non-immutable formats are the new trend of data storage. Hudi, Iceberg, Delta Lake… which suits your needs better? Check out this article by @lakeFS and choose wisely!

level:medium topic:deltalake topic:hudi topic:iceberg topic:storage