Dreaming of better data processing


I’ve tried to summarize most of the ideas I have on better data processes. Of course many of them are simplified and up to debate but I guess this is a good starting point.

SQL is able to express batch and streaming…

Materialized views could be part of the solution

All actions are modeled as events

Data mesh is a reality so…

Each team owns their data…

And keeps it documented and up-to-date

Security data catalog. It should be a thing

Data lineage is a first-class citizen from the events to the user interface

The query engine is able to connect to most data systems

We can choose the storage that most suits us

We can define pre-aggregations for a table (aggregation index?)

Indexes and partitions can be automated according to usage

If the above isn’t enough to meet requirements (i.e SLA’s, costs) automatically move data between data systems

Data Ops improves the lives of everyone

Data tests. Let’s do more of this