Articles tagged with architecture

How Meta built the infrastructure for Threads — Laine Campbell, Chunqiang (CQ) Tang @ Engineering at Meta

It’s always interesting to read/watch real system’s design with explanation. Especially when you are trying to design them in mind.

level:medium topic:architecture


Data Domains — Where do I start? — Piethein Strengholt

Good and practical article about data domains. Not only from a data team perspective but also from a development perspective.

level:medium topic:architecture topic:data-mesh


Seven Principles of Cloud-Native Architecture — Alibaba Cloud Native Community Blog

Clear explanation of cloud-native architecture principles.

level:medium topic:architecture


Why Uber Engineering Switched from Postgres to MySQL — Uber Engineering Blog

Uber started with a monolithic backend application that used Postgres. As the company evolved and grew, it moved to microservices and changed its approach to working with data. The paper explains why this migration happened and what benefits the company gets from it. There is nothing about OLAP in the paper but still a very interesting story.

level:beginner topic:architecture topic:databases


5 Helpful Extract & Load Practices for High-Quality Raw Data — Sven Balnojan

What layer do you focus on when creating data architecture? If it’s not Raw, then this article is for you. 5 points that will save your time n future.

level:medium topic:architecture topic:practices


Functional Data Engineering — a modern paradigm for batch data processing — Maxime Beauchemin

It’s very old. But quite conceptual.

level:medium topic:architecture


Software Architecture and Design InfoQ Trends Report - April 2023 — InfoQ

Data engineers are dealing with code as much as with data. It means that software architecture and design are not alien to us.

level:medium topic:architecture


Using Metrics Layer to Standardize and Scale Experimentation at DoorDash — Arun Balasubramani @ DoorDash engineering blog

It’s not the first article about metric’s layer in our blog, but again it worth to be seen. Motivation and challenges are waiting for you.

level:medium topic:architecture


Your Data Catalog Shouldn’t Be Just One More UI — Mahdi Karabiben

When a data catalog is not only for business users.

level:medium topic:architecture


Git’s database internals — Derrick Stolee @ GitHub Blog.

  1. Packed object store
  2. Commit history queries
  3. File history queries
  4. Distributed synchronization
  5. Scalability

5 part series to look at Git’s internals from the perspective of a database.

level:advanced topic:architecture topic:git topic:storage


Upgrading Data Warehouse Infrastructure at Airbnb — Ronnie Zhu @ The Airbnb Tech Blog

Airbnb decided to refresh Spark and start using Iceberg. In this article, we will see motivation, case studies, and tuning experience.

level:medium topic:architecture topic:iceberg topic:spark


Uber Freight Carrier Metrics with Near-Real-Time Analytics — Uber Engineering Blog

It’s time to design systems with Uber. New use case: near real time data on the Carrier App.

level:medium topic:architecture


Data Mesh / Data Product Security Pattern — Eric Broda

Good intro article to data security. It’s more about the structurization of concepts.

level:medium topic:architecture topic:security


Designing Instagram — HighScalability Blog

Do you want to try to design Instagram with Machine Learning Lead from Amazon? Well, now you can do it.
This article is follow up to DE or DIE Meetup #9 (in Russian).

level:medium topic:architecture


Powering real-time data analytics with Druid at Twitter — Twitter Engineering Blog

At least, now we know that Druid has out-of-the-box ingestion connectors for Apache Kafka, and it seems that it works great! Just check Twitter streaming architecture.

level:medium topic:architecture topic:druid


Presto on Apache Kafka At Uber Scale — User Engineering Blog

We like Uber engineering posts so much. Because they seem like ADRs: problem, current environment description, alternatives, supposed architecture.

level:medium topic:architecture topic:kafka topic:presto


Understanding the Metrics Store — Joanna He @ Kyligence Blog.

The article about Minerva, Airbnb’s metric platform (digests #9 and #12), seems unique, but it’s actually not. The term seeps into everyday life and no longer looks so unusual. This simple article will shed light on the essence of the metric store.

level:medium topic:architecture


A brief history of the metrics store — Nick Handel @ Towards Data Science.

We haven’t learned about feature store yet. But it won’t stop us to read about Metric stores :)

level:medium topic:architecture


How to Build Your Data Reliability Stack — Barr Moses @ Towards Data Science Blog.

What do you need to not worry about your data.

level:medium topic:architecture


Landing data on S3: the good, the bad, and the ugly — Sergey Ivanychev @Joom Blog at Medium.

How Joom consumes events from Kafka into the Data Platform.

level:medium topic:architecture


Separation of Compute and Data: A Profound Shift in Data Architecture — Billy Bosworth @ Dremio blog.

Splendid little article. Pros but not cons of separate storage and compute.

level:medium topic:architecture


Introducing Amazon S3 shuffle in AWS Glue — AWS Big Data Blog.

Good explanations of how shuffle works, how it uses local disks, how to track it in AWS Glue UI, what problems we can potentially have and how to escape it.

level:medium topic:architecture topic:spark


How to Build a Modern Data Platform Utilizing Data Vault — Brian Arnold @ phData Blog.

A brief overview of the Data Vault approach to model your data warehouse structure. The author focuses on which place the Data Vault takes in the overall data platform architecture.

Thanks to https://t.me/rockyourdata community.

level:medium topic:architecture topic:data-vault


How Airbnb Enables Consistent Data Consumption at Scale, Part III: Building a coherent consumption experience — Airbnb Blog.

Airbnb metrics series are coming back! Here we can find out how and why they built API layer over metrics and dimensions. And how it was integrated into BI and reporting tools.

Previously on:
Part I, Part II

level:medium topic:architecture


Data Lineage at Slack — Samuel Bock @ Slack Engineering.

Slack team’s perspective on the problem of Data Lineage. The article describes the architecture of their own solution.

level:medium topic:architecture topic:data-lineage


Data Movement in Netflix Studio via Data Mesh — Netflix Technology Blog.

New interesting article about Netflix platform. How to combine not quite that Data Mesh, CDC, schema evolution, data enrichment and data quality? Read the article.

level:medium topic:architecture


How Airbnb Achieved Metric Consistency at Scale Part I, Part II — The Airbnb Tech Blog.

Do you have data marts in your DWH? How to build a clear single source of truth for business metrics? How to make them resistant to changes? How to validate and orchestrate data marts jobs? Let’s read about Minerva — Airbnb’s metric platform that covers the full life cycle of a metric.

level:medium topic:architecture


Data Lake vs. Data Warehouse — Subsurface Blog.

If you’re new in data engineering, this article will provide an opportunity to synchronize terminology and understanding of key concepts and their differences. Without further ado, just the essence.

level:beginner topic:architecture topic:data-lake topic:data-warehouse


Create Cloud Architecture with Diagrams for AWS, Azure, and GPC — Bruce H. Cottman, Ph.D. @ Towards Data Science.

Sometimes we all need to project our cloud reality to paper. Did you know that you can do it with text?
Also, you may take a look at the Kroki tool.

level:beginner topic:architecture topic:tooling


Design a data mesh architecture using AWS Lake Formation and AWS Glue — AWS Big Data Blog.

What is a data mesh? How to implement it? An AWS implementation of data architecture for data mesh.

level:medium topic:architecture topic:aws topic:glue


4 Layers of a Modern Data Pipeline — Thomas Kane.

It’s a short, but comprehensive post, which describes the author’s take on how modern pipelines are built and what they consist of (AWS stack and Open Source alternatives).

level:medium topic:architecture topic:aws


Keystone Real-time Stream Processing Platform — Netflix Technology Blog.

A high-level overview of Netflix design principles and approaches.

level:medium topic:architecture topic:streaming


Uber’s Data Journey 100+PB with Minute Latency — Reza Shiftehfar @ Uber.

In this talk, Uber reflects on their journey with scaling Data Infrastructure from 1PB to 10PB to 100PB and beyond while reducing latency from 24 hours to 3h to 1h to 10 minutes. There a lot of details about architecture evolution and instruments. And one important thing, at what point should you think about building a Data Platform.

level:medium topic:architecture topic:story type:video


How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh — Zhamak Dehghani @ Martin Fowler Blog.

Did haven’t you still read it? Data mesh is a popular topic now. Read how to build processes, commands, and storage so you don’t have bottlenecks anywhere (maybe).

level:medium topic:architecture topic:team


Evolution to the Data Lakehouse — Bill Inmon, Mary Levins @ Databricks Blog.

The history of the evolution of data warehouses into data lakes and further into data lakehouses. Pros and cons of these approaches.

level:medium topic:architecture


You Can Replace Kafka with a Database — Emil Koutanov @ Towards Data Science (Medium).

Kafka is the new gold. What if you don’t like it and want to replace it? Of course, there are options like Apache Pulsar, but… is it possible to replace Apache Kafka with relational DB? Looks like the answer is “yes”. Now it’s your turn to decide if you need it.

level:medium topic:architecture topic:kafka


Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing — Amey Chaugule @ Uber Engineering Blog.

Out-of-order data is a big problem that can be mitigated, but it should be totally resolved in some cases. Usually, it is resolved by using lambda architecture. Uber proposes their approach how to handle this problem in kappa architecture.

level:advanced topic:architecture topic:late-arriving-data


Composable Data Processing with Apache Spark — Dilip Biswal, Shone Sadler @ Data & AI Summit.

Use it for single data pipeline architecture ideas. Data parsing, validation, quarantining. Much attention is paid to error handling. With great examples.

level:medium topic:architecture type:video