Dreaming of better data processing
I’ve tried to summarize most of the ideas I have on better data processes. Of course many of them are simplified and up to debate but I guess this is a good starting point.
SQL is able to express batch and streaming…
Materialized views could be part of the solution
All actions are modeled as events
Data mesh is a reality so…
Each team owns their data…
And keeps it documented and up-to-date
Security data catalog. It should be a thing
Data lineage is a first-class citizen from the events to the user interface
The query engine is able to connect to most data systems
We can choose the storage that most suits us
We can define pre-aggregations for a table (aggregation index?)
Indexes and partitions can be automated according to usage
If the above isn’t enough to meet requirements (i.e SLA’s, costs) automatically move data between data systems
Data Ops improves the lives of everyone
Data tests. Let’s do more of this