Articles tagged with streaming

Streaming SQL in Data Mesh — Netflix Blog

If you’re writing a Data Platform in your company, there are not too many insights here. But at least one is guaranteed: how the Data Platform’s UI can look.

level:medium topic:data-platform topic:streaming


Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi — Uber Engineering Blog

As usual, Uber’s blog is fantastic resource for new ideas. This fresh article explains how to build streaming ETL framework and how to address some well-known issue on this way.

level:medium topic:streaming


Driving efficiency and developer productivity at Facebook scale, Asynchronous computing at Meta: Overview and learnings — Engineering at Meta Blog

By the name you could think it is something from other world. You are right but only partially. Actually these are 2 articles of how to build a mix of scalable ETL and distributed computation in-house.

level:medium topic:streaming


A Survey on Transactional Stream Processing — Shuhao Zhang, Juan Soto, Volker Markl

The authors made a big research on different stream processing systems and their transactional guarantees. You will find a classification of transactional stream processing systems, and tradeoffs in the design of real implementations.

level:advanced topic:streaming type:whitepaper


Watermarks in Stream Processing Systems: Semantics and Comparative Analysis of Apache Flink and Google Cloud Dataflow — Tyler Akidau

A new white paper from Tyler Akidau (author of the very cool book "Streaming Systems") where they compare watermarks in different aspects in Google Cloud Dataflow and Apache Flink. Watermarks represent the temporal completeness of an out-of-order data stream. Reasoning about the completeness of infinite streams is one of the most critical challenges faced by stream processing systems. It’s also one of the least understood and least adequately addressed compared to other approaches for dealing with the completeness of unbounded data streams.

level:advanced topic:streaming type:whitepaper


4 Key Design Principles and Guarantees of Streaming Databases — Guozhang Wang @ Confluent.

In the modern, we are more and more faced with streaming data. The approaches to designing streaming databases are different compared to commonly known “static” relational databases. This article introduces important principles of how to build such systems.

level:medium topic:streaming


Streaming 101: The world beyond batch: Part I, Part II — Tyler Akidau.

Here are two fundamental articles that help you go deeper into streaming theory and understand the key difference between batch and stream processing in terms of time. Help you collect the right questions that you should ask yourself when you work with watermarks, triggers, and windows. These articles have changed my understanding forever, and I hope they will change yours.

level:advanced topic:late-arriving-data topic:streaming


Event-time Aggregation and Watermarking in Apache Spark’s Structured Streaming — Tathagata Das @ Databricks Blog.

If you use Spark Streaming and are interested in handling late-arriving data, this article gives you a practical approach to which windows strategy use and how watermark can help you.

level:medium topic:streaming topic:spark topic:late-arriving-data


Keystone Real-time Stream Processing Platform — Netflix Technology Blog.

A high-level overview of Netflix design principles and approaches.

level:medium topic:architecture topic:streaming


Change Data Capture with Flink SQL and Debezium — Marta Paes @ DataEngBytes.

Good overview of Flink, Debezium and how they can work together.

level:medium topic:debezium topic:flink topic:streaming type:video


Running Apache Flink on Kubernetes — Empathy.co Blog @ Medium.

In the world where k8s won the race we’re trying to run everything on it. Here is the recipe of running Flink on Kubernetes.

level:advanced topic:flink topic:kubernetes topic:streaming


Kafka Resiliency — Retry/Delay Topic, Dead Letter Queue (DLQ) — Sheshnath Kumar @ Medium.

Three typical architectures for resilient message handling in Kafka. If you have Kafka source in your data pipelines, it can be interesting.

level:medium topic:kafka topic:streaming