Reading Update | Hopes for 2022
2021 was an year where I got to fulfill some of my goals. I got a greater track of writing articles (most of them reading updates) and I got to talk at a podcast and at Coalesce. Of course this meant getting new knowledge in areas like dbt, data testing. I also got started on a second brain which is helping me to learn more and better (writing really helps). And these have been some of my main goals.
I got introduced into Scala and technologies like Spark but for 2022 I’m hoping to:
- Read Scala book
- Read flink book
- Work with apache iceberg either in a batch and a streaming job
- Talk in two conferences/meetups
- Write more on the second brain (summarize of new things learned on each article)
Data Engineering
- Don’t Let the Internet Dupe You, Event Sourcing is Hard - Event sourcing is great but, at least for now, can be quite harder than batch mode
- Metadata Indexing in Iceberg - The creator of iceberg gives some insights into how it’s metadata indexing works
- Iceberg Spec - not an article but I think it’s a must for those trying to use apache iceberg
- How to ETL at Petabyte-Scale with Trino - some ideas on how to have etl using trino
- Announcing OpenMetadata - Standardization is great and openmetadata is a step in the correct direction
- How Uber Achieves Operational Excellence in the Data Quality Experience - Another intake on how uber has standardized it’s data ops and how it ensures quality
- Launching at LinkedIn: The Story of Apache Pinot - A history on the how apache Pinot started
- Revisiting Java in 2021 - Java 17 has gotten some great features since Java8 and this article tries to summarize them
- Using an ETL framework vs writing yet another ETL script - Airbyte is an open source version of fivetran and tries to convene how it’s the best approach vs closed source ones
- Series on Data protection at Airbnb