Articles tagged with late-arriving-data

Streaming 101: The world beyond batch: Part I, Part II — Tyler Akidau.

Here are two fundamental articles that help you go deeper into streaming theory and understand the key difference between batch and stream processing in terms of time. Help you collect the right questions that you should ask yourself when you work with watermarks, triggers, and windows. These articles have changed my understanding forever, and I hope they will change yours.

level:advanced topic:late-arriving-data topic:streaming


An optimistic approach to handle out-of-order events within analytical stream processing — JetBrains.

29 article pages about out-of-order data arrival. This topic will be burned into your mind. If no, read the references, too.

level:advanced topic:late-arriving-data type:whitepaper


Event-time Aggregation and Watermarking in Apache Spark’s Structured Streaming — Tathagata Das @ Databricks Blog.

If you use Spark Streaming and are interested in handling late-arriving data, this article gives you a practical approach to which windows strategy use and how watermark can help you.

level:medium topic:streaming topic:spark topic:late-arriving-data


Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing — Amey Chaugule @ Uber Engineering Blog.

Out-of-order data is a big problem that can be mitigated, but it should be totally resolved in some cases. Usually, it is resolved by using lambda architecture. Uber proposes their approach how to handle this problem in kappa architecture.

level:advanced topic:architecture topic:late-arriving-data


Handling Late Arriving Dimensions Using a Reconciliation Pattern — Databricks Blog.

Several approaches to handle late-arriving dimensions. Problem statement and evolution of different approaches. Not so deep as you may expect but the core ideas are understandable.

level:advanced topic:late-arriving-data