Articles tagged with data-quality
Data Quality Score: How We Evolved the Data Quality Strategy at Airbnb — Clark Wright @ Netflix Data Engineering Open Forum 2024
Some time ago, we published an article from AirBnb about the Data Quality framework that they had built. Now, you may watch the video for more details. Inspired by the level of platform solutions that they’re building.
Data Quality Score: The next chapter of data quality at Airbnb — Airbnb blog
Absolutely brilliant approach of identifying data quality for all company data assets. Quite realistic approach.
Data Quality Monitoring is dead. Say Hello to Full Data Stack Observability — Salma Bakouk
New terms keep appearing in the world of data engineering. Today we’re talking about Data Observability.
Towards data quality management at LinkedIn — Liangzhao Zeng @ Towards Data Science
Short article about the top-level architecture of their Data Health Monitor system.
How Airbnb Built “Wall” to prevent data bugs — Subrata Biswas @ Airbnb Tech Blog.
Data Wall sounds impressive. Especially if it’s AirBnb framework for offline data quality checks.
Automating Large-Scale Data Quality Verification — Amazon Research.
In this whitepaper, the amazon research team presents a system for automating data quality verification at scale and discusses their design decisions, describing the resulting system architecture and giving an experimental evaluation on various datasets.
How Uber Achieves Operational Excellence in the Data Quality Experience — Uber Engineering Blog
What base principles lie at the heart of the Uber data quality platform? As always, detailed and understandable article by Uber engineers.
4 Things You Need to Know When Solving for Data Quality — Barr Moses @ Towards Data Science Blog.
Barr Moses is a professional in making data reliable. There are key 4 points that will give you back a healthy pragmatism. Definitely recommend to read.
How we deal with Data Quality using Circuit Breakers — Sandeep Uttamchandani @ Wrong AI.
This is an interesting way to use circuit breaker patterns inside the pipeline processes to prevent low-quality data from propagating to downstream processes.
Data Quality Meetup #4 — Datafold Team.
The following topics are discussed:
- The Best Data Quality Investment in 2021
- Verity: Data Quality as a Service
- Unit Tests for SQL with dbt & Python
- Building Security Conscious Data Apps
- Fake It Till You Make It: A Backward Approach to Data Products
Thanks to @SimonOsipov for this topic.
Data Quality Roadmap Part I and Data Quality Roadmap Part II — Alexander Eliseev @ Medium.
Contrary to what we used to do — just building tests and metering time, folks from Wrike are creating the comprehensive matrix of what data quality is, how to achieve this quality and how to measure it. More than that, they’re discussing what could go wrong without these metrics! (Anything, of course).
Root Cause Analysis for Data Engineers — Towards Data Science @ Medium.
How to perform data downtime analysis: where should we look, and a quite controversial idea on “in which order” we should look at potential causes.