Articles tagged with architecture
InfoQ Software Architecture and Design Trends Report - April 2024 — InfoQ
Have you already heard about architecture as a team sport? About cell-based architecture? Review the latest innovations and trends in the design trends report.
How Meta built the infrastructure for Threads — Laine Campbell, Chunqiang (CQ) Tang @ Engineering at Meta
It’s always interesting to read/watch real system’s design with explanation. Especially when you are trying to design them in mind.
Data Domains — Where do I start? — Piethein Strengholt
Good and practical article about data domains. Not only from a data team perspective but also from a development perspective.
Seven Principles of Cloud-Native Architecture — Alibaba Cloud Native Community Blog
Clear explanation of cloud-native architecture principles.
Why Uber Engineering Switched from Postgres to MySQL — Uber Engineering Blog
Uber started with a monolithic backend application that used Postgres. As the company evolved and grew, it moved to microservices and changed its approach to working with data. The paper explains why this migration happened and what benefits the company gets from it. There is nothing about OLAP in the paper but still a very interesting story.
5 Helpful Extract & Load Practices for High-Quality Raw Data — Sven Balnojan
What layer do you focus on when creating data architecture? If it’s not Raw, then this article is for you. 5 points that will save your time n future.
Functional Data Engineering — a modern paradigm for batch data processing — Maxime Beauchemin
It’s very old. But quite conceptual.
Software Architecture and Design InfoQ Trends Report - April 2023 — InfoQ
Data engineers are dealing with code as much as with data. It means that software architecture and design are not alien to us.
Using Metrics Layer to Standardize and Scale Experimentation at DoorDash — Arun Balasubramani @ DoorDash engineering blog
It’s not the first article about metric’s layer in our blog, but again it worth to be seen. Motivation and challenges are waiting for you.
Your Data Catalog Shouldn’t Be Just One More UI — Mahdi Karabiben
When a data catalog is not only for business users.
Git’s database internals — Derrick Stolee @ GitHub Blog.
- Packed object store
- Commit history queries
- File history queries
- Distributed synchronization
- Scalability
5 part series to look at Git’s internals from the perspective of a database.
Upgrading Data Warehouse Infrastructure at Airbnb — Ronnie Zhu @ The Airbnb Tech Blog
Airbnb decided to refresh Spark and start using Iceberg. In this article, we will see motivation, case studies, and tuning experience.
Uber Freight Carrier Metrics with Near-Real-Time Analytics — Uber Engineering Blog
It’s time to design systems with Uber. New use case: near real time data on the Carrier App.
Data Mesh / Data Product Security Pattern — Eric Broda
Good intro article to data security. It’s more about the structurization of concepts.
Designing Instagram — HighScalability Blog
Do you want to try to design Instagram with Machine Learning Lead from Amazon? Well, now you can do it.
This article is follow up to DE or DIE Meetup #9 (in Russian).
Powering real-time data analytics with Druid at Twitter — Twitter Engineering Blog
At least, now we know that Druid has out-of-the-box ingestion connectors for Apache Kafka, and it seems that it works great! Just check Twitter streaming architecture.
Presto on Apache Kafka At Uber Scale — User Engineering Blog
We like Uber engineering posts so much. Because they seem like ADRs: problem, current environment description, alternatives, supposed architecture.
Understanding the Metrics Store — Joanna He @ Kyligence Blog.
The article about Minerva, Airbnb’s metric platform (digests #9 and #12), seems unique, but it’s actually not. The term seeps into everyday life and no longer looks so unusual. This simple article will shed light on the essence of the metric store.
A brief history of the metrics store — Nick Handel @ Towards Data Science.
We haven’t learned about feature store yet. But it won’t stop us to read about Metric stores :)
How to Build Your Data Reliability Stack — Barr Moses @ Towards Data Science Blog.
What do you need to not worry about your data.
Landing data on S3: the good, the bad, and the ugly — Sergey Ivanychev @Joom Blog at Medium.
How Joom consumes events from Kafka into the Data Platform.
Separation of Compute and Data: A Profound Shift in Data Architecture — Billy Bosworth @ Dremio blog.
Splendid little article. Pros but not cons of separate storage and compute.
Introducing Amazon S3 shuffle in AWS Glue — AWS Big Data Blog.
Good explanations of how shuffle works, how it uses local disks, how to track it in AWS Glue UI, what problems we can potentially have and how to escape it.
How to Build a Modern Data Platform Utilizing Data Vault — Brian Arnold @ phData Blog.
A brief overview of the Data Vault approach to model your data warehouse structure. The author focuses on which place the Data Vault takes in the overall data platform architecture.
Thanks to https://t.me/rockyourdata community.
How Airbnb Enables Consistent Data Consumption at Scale, Part III: Building a coherent consumption experience — Airbnb Blog.
Airbnb metrics series are coming back! Here we can find out how and why they built API layer over metrics and dimensions. And how it was integrated into BI and reporting tools.
Previously on:
Part I, Part II
Data Lineage at Slack — Samuel Bock @ Slack Engineering.
Slack team’s perspective on the problem of Data Lineage. The article describes the architecture of their own solution.
Data Movement in Netflix Studio via Data Mesh — Netflix Technology Blog.
New interesting article about Netflix platform. How to combine not quite that Data Mesh, CDC, schema evolution, data enrichment and data quality? Read the article.
How Airbnb Achieved Metric Consistency at Scale Part I, Part II — The Airbnb Tech Blog.
Do you have data marts in your DWH? How to build a clear single source of truth for business metrics? How to make them resistant to changes? How to validate and orchestrate data marts jobs? Let’s read about Minerva — Airbnb’s metric platform that covers the full life cycle of a metric.
Data Lake vs. Data Warehouse — Subsurface Blog.
If you’re new in data engineering, this article will provide an opportunity to synchronize terminology and understanding of key concepts and their differences. Without further ado, just the essence.
Create Cloud Architecture with Diagrams for AWS, Azure, and GPC — Bruce H. Cottman, Ph.D. @ Towards Data Science.
Sometimes we all need to project our cloud reality to paper. Did you know that you can do it with text?
Also, you may take a look at the Kroki tool.
Design a data mesh architecture using AWS Lake Formation and AWS Glue — AWS Big Data Blog.
What is a data mesh? How to implement it? An AWS implementation of data architecture for data mesh.
4 Layers of a Modern Data Pipeline — Thomas Kane.
It’s a short, but comprehensive post, which describes the author’s take on how modern pipelines are built and what they consist of (AWS stack and Open Source alternatives).
Keystone Real-time Stream Processing Platform — Netflix Technology Blog.
A high-level overview of Netflix design principles and approaches.
Uber’s Data Journey 100+PB with Minute Latency — Reza Shiftehfar @ Uber.
In this talk, Uber reflects on their journey with scaling Data Infrastructure from 1PB to 10PB to 100PB and beyond while reducing latency from 24 hours to 3h to 1h to 10 minutes. There a lot of details about architecture evolution and instruments. And one important thing, at what point should you think about building a Data Platform.
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh — Zhamak Dehghani @ Martin Fowler Blog.
Did haven’t you still read it? Data mesh is a popular topic now. Read how to build processes, commands, and storage so you don’t have bottlenecks anywhere (maybe).
Evolution to the Data Lakehouse — Bill Inmon, Mary Levins @ Databricks Blog.
The history of the evolution of data warehouses into data lakes and further into data lakehouses. Pros and cons of these approaches.
You Can Replace Kafka with a Database — Emil Koutanov @ Towards Data Science (Medium).
Kafka is the new gold. What if you don’t like it and want to replace it? Of course, there are options like Apache Pulsar, but… is it possible to replace Apache Kafka with relational DB? Looks like the answer is “yes”. Now it’s your turn to decide if you need it.
Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing — Amey Chaugule @ Uber Engineering Blog.
Out-of-order data is a big problem that can be mitigated, but it should be totally resolved in some cases. Usually, it is resolved by using lambda architecture. Uber proposes their approach how to handle this problem in kappa architecture.
Composable Data Processing with Apache Spark — Dilip Biswal, Shone Sadler @ Data & AI Summit.
Use it for single data pipeline architecture ideas. Data parsing, validation, quarantining. Much attention is paid to error handling. With great examples.