Reading Update
Hey! I’ve been a bit out but, nonetheless, I’ve been keeping up with news, while studying a bit on scala.
Data Engineering
- Airflow, Prefect, and Dagster: An Inside Look - A comparison between the three most talkdesk schedulers. I’ve only used airflow but it’s interesting to some challengers appear
- Improve Apache Spark performance with the S3 magic committer - For those using hive on S3, this is a nifty trick. If hopefully you are already using table formats like Hudi, Iceberg or Delta, this won’t be so much of an issue
- A brief history of the metrics store - Transform is one of the new players in this new look into how to unify and improve the metrics of an organization. This was being done on the BI tools and this looks a great way to improve on the state of the art
Engineering
- Cost of Attrition - A good take on the true price of losing a team member :-(
- Git Organized: A Better Git Flow - Deliver fast and clean the git commit history before merging to master
- Auto-Diagnosis and Remediation in Netflix Data Platform - Netflix gives some great insights into how their data platform are proactive in detecting issues and respond to them automatically
- Consider SQLite - This article proposes the use of sqllite before moving into larger databases. I tend to concur that this might be a great way for most projects as there are very few that go beyond it’s limits
- Best practices for writing code comments - Good tips from stack overflow on how to write comments
Others
- Ten Years of Logging My Life - For those trying to register as much of their life as possible 😅
Have a nice week :-)