Articles tagged with data-quality

Data Quality Score: The next chapter of data quality at Airbnb — Airbnb blog

Absolutely brilliant approach of identifying data quality for all company data assets. Quite realistic approach.

level:advanced topic:data-quality


Data Quality Monitoring is dead. Say Hello to Full Data Stack Observability — Salma Bakouk

New terms keep appearing in the world of data engineering. Today we’re talking about Data Observability.

level:medium topic:data-quality


Towards data quality management at LinkedIn — Liangzhao Zeng @ Towards Data Science

Short article about the top-level architecture of their Data Health Monitor system.

level:medium topic:data-quality


How Airbnb Built “Wall” to prevent data bugs — Subrata Biswas @ Airbnb Tech Blog.

Data Wall sounds impressive. Especially if it’s AirBnb framework for offline data quality checks.

level:medium topic:data-quality


Automating Large-Scale Data Quality Verification — Amazon Research.

In this whitepaper, the amazon research team presents a system for automating data quality verification at scale and discusses their design decisions, describing the resulting system architecture and giving an experimental evaluation on various datasets.

level:advanced topic:data-quality type:whitepaper


How Uber Achieves Operational Excellence in the Data Quality Experience — Uber Engineering Blog

What base principles lie at the heart of the Uber data quality platform? As always, detailed and understandable article by Uber engineers.

level:medium topic:data-quality


4 Things You Need to Know When Solving for Data Quality — Barr Moses @ Towards Data Science Blog.

Barr Moses is a professional in making data reliable. There are key 4 points that will give you back a healthy pragmatism. Definitely recommend to read.

level:medium topic:data-quality


How we deal with Data Quality using Circuit Breakers — Sandeep Uttamchandani @ Wrong AI.

This is an interesting way to use circuit breaker patterns inside the pipeline processes to prevent low-quality data from propagating to downstream processes.

level:advanced topic:data-quality topic:pipelines


Data Quality Meetup #4 — Datafold Team.

The following topics are discussed:

  • The Best Data Quality Investment in 2021
  • Verity: Data Quality as a Service
  • Unit Tests for SQL with dbt & Python
  • Building Security Conscious Data Apps
  • Fake It Till You Make It: A Backward Approach to Data Products

Thanks to @SimonOsipov for this topic.

level:medium topic:data-quality type:video


Data Quality Roadmap Part I and Data Quality Roadmap Part II — Alexander Eliseev @ Medium.

Contrary to what we used to do — just building tests and metering time, folks from Wrike are creating the comprehensive matrix of what data quality is, how to achieve this quality and how to measure it. More than that, they’re discussing what could go wrong without these metrics! (Anything, of course).

level:medium topic:data-quality


Root Cause Analysis for Data Engineers — Towards Data Science @ Medium.

How to perform data downtime analysis: where should we look, and a quite controversial idea on “in which order” we should look at potential causes.

level:advanced topic:data-quality topic:etl