Dec 21, 2021

Reading Update | Hopes for 2022

Photo by Giammarco on Unsplash

2021 was an year where I got to fulfill some of my goals. I got a greater track of writing articles (most of them reading updates) and I got to talk at a podcast and at Coalesce. Of course this meant getting new knowledge in areas like dbt, data testing. I also got started on a second brain which is helping me to learn more and better (writing really helps). And these have been some of my main goals.

I got introduced into Scala and technologies like Spark but for 2022 I’m hoping to:

Read Scala book
Read flink book
Work with apache iceberg either in a batch and a streaming job
Talk in two conferences/meetups
Write more on the second brain (summarize of new things learned on each article)

Data Engineering

Don’t Let the Internet Dupe You, Event Sourcing is Hard - Event sourcing is great but, at least for now, can be quite harder than batch mode
Metadata Indexing in Iceberg - The creator of iceberg gives some insights into how it’s metadata indexing works
Iceberg Spec - not an article but I think it’s a must for those trying to use apache iceberg
How to ETL at Petabyte-Scale with Trino - some ideas on how to have etl using trino
Announcing OpenMetadata - Standardization is great and openmetadata is a step in the correct direction
How Uber Achieves Operational Excellence in the Data Quality Experience - Another intake on how uber has standardized it’s data ops and how it ensures quality
Launching at LinkedIn: The Story of Apache Pinot - A history on the how apache Pinot started
Revisiting Java in 2021 - Java 17 has gotten some great features since Java8 and this article tries to summarize them
Using an ETL framework vs writing yet another ETL script - Airbyte is an open source version of fivetran and tries to convene how it’s the best approach vs closed source ones
Series on Data protection at Airbnb