#3. Architecural one

Topics: architecture, Debezium, Apache Flink, Apache Kafka, late arriving data, streaming.

Evolution to the Data Lakehouse — Bill Inmon, Mary Levins @ Databricks Blog.

The history of the evolution of data warehouses into data lakes and further into data lakehouses. Pros and cons of these approaches.

You Can Replace Kafka with a Database — Emil Koutanov @ Towards Data Science (Medium).

Kafka is the new gold. What if you don’t like it and want to replace it? Of course, there are options like Apache Pulsar, but… is it possible to replace Apache Kafka with relational DB? Looks like the answer is “yes”. Now it’s your turn to decide if you need it.

Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing — Amey Chaugule @ Uber Engineering Blog.

Out-of-order data is a big problem that can be mitigated, but it should be totally resolved in some cases. Usually, it is resolved by using lambda architecture. Uber proposes their approach how to handle this problem in kappa architecture.

Change Data Capture with Flink SQL and Debezium — Marta Paes @ DataEngBytes.

Good overview of Flink, Debezium and how they can work together.

Composable Data Processing with Apache Spark — Dilip Biswal, Shone Sadler @ Data & AI Summit.

Use it for single data pipeline architecture ideas. Data parsing, validation, quarantining. Much attention is paid to error handling. With great examples.

Written on May 20, 2021