#3. Architecural one

Topics: architecture, Debezium, Apache Flink, Apache Kafka, late arriving data, streaming.


Evolution to the Data Lakehouse — Bill Inmon, Mary Levins @ Databricks Blog.

The history of the evolution of data warehouses into data lakes and further into data lakehouses. Pros and cons of these approaches.

level:medium topic:architecture


You Can Replace Kafka with a Database — Emil Koutanov @ Towards Data Science (Medium).

Kafka is the new gold. What if you don’t like it and want to replace it? Of course, there are options like Apache Pulsar, but… is it possible to replace Apache Kafka with relational DB? Looks like the answer is “yes”. Now it’s your turn to decide if you need it.

level:medium topic:architecture topic:kafka


Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing — Amey Chaugule @ Uber Engineering Blog.

Out-of-order data is a big problem that can be mitigated, but it should be totally resolved in some cases. Usually, it is resolved by using lambda architecture. Uber proposes their approach how to handle this problem in kappa architecture.

level:advanced topic:architecture topic:late-arriving-data


Change Data Capture with Flink SQL and Debezium — Marta Paes @ DataEngBytes.

Good overview of Flink, Debezium and how they can work together.

level:medium topic:debezium topic:flink topic:streaming type:video


Composable Data Processing with Apache Spark — Dilip Biswal, Shone Sadler @ Data & AI Summit.

Use it for single data pipeline architecture ideas. Data parsing, validation, quarantining. Much attention is paid to error handling. With great examples.

level:medium topic:architecture type:video


Written on May 20, 2021