Espresso #8: Why the data space struggles with standardization + Reflecting on turning 30

Nov 11, 2024

Hello data friends,

Grab your espresso (or cappuccino if you’re feeling adventurous) and let’s dive into this month’s brew! We’re tackling the barriers to standardization in the data space, some personal wisdom I’ve picked up in my 20s (it’s been a ride!), and the 100-page gems of The Do Book Co. So, without further ado, let’s talk data engineering (and personal updates) while the espresso is still hot.

Barriers to standardization in the data space

Maxime Beauchemin, one of the pioneers (THE pioneer?) of the data engineering field, recently wrote about why data teams keep reinventing the wheel by building their transformation layer from scratch. In his article (which I highly recommend), he offers a proposal relying on “Parametric Pipelines” and “Unified Models” to build reusable and generic data components/assets that are also flexible enough to accommodate the inevitable specificities of every business.

Maxime’s article navigates the nuances of one key part of the data stack (transformation) and why it lacks standards, but I think he touched on a topic that applies to the stack as a whole: We're great at standardizing the tools (thanks dbt & Airflow), but everything else feels like the Wild West - which is a shame since we universally agree that all data teams do slightly different versions of the same job.

I think this lack of standardization outside of the tools themselves is mainly due to two barriers we still need to overcome:

Every tool generates its own metadata (and doesn’t share it): Probably the thing that hurt data teams the most in the past five years is that every tool within the Modern Data Stack wanted to store its own metadata and build features on top of it. This meant that every tool knew a bit about the data and the pipelines, but no tool had the full story (not even the data catalog) - making it very difficult to centralize all of the stack’s metadata in one place and build insights on top of it (or even define standards for managing it). The ideal scenario in this area was to have a standard representation of metadata that tools within the stack can implement and then leverage to exchange metadata in an automated manner, allowing for the rise of standardized ways to manage the data itself. I still believe we’ll reach that state eventually, but the road ahead is, unfortunately, long.
We don’t write enough YAML (seriously - hear me out): Although the “data engineers are in fact YAML engineers” reflection started out of (valid) frustration with YAML, I think we’re still paying a high price for not doing (most) things in code. Even though the Modern Data Stack (RIP) brought with it a lot of advancements, the focus remained on the UI, and many technical workflows (essential for automating and standardizing things) were simply ignored. By moving more things (dashboards, tooling configuration, metrics, data observability, etc.) to code/YAML, we dramatically shorten the path toward standardization and open new industry-wide collaboration doors. The “Post-Modern” Data Stack (PMDS?) is fortunately taking things in the right direction, and I believe we’ll overcome this barrier in the next few years.

Out of the comfort zone: 5 learnings from my twenties

This September I turned 30 and decided that it was a good moment to reflect on the biggest learnings of my twenties (which were, in a nutshell, a rollercoaster). The end result felt like something worth sharing (more on that below), but these are the five learnings I’m extremely grateful for:

Figure out how you’re wired: What makes you you?
It’s all about resilience (Luctor et Emergo): You’ll inevitably face an insurmountable mountain, and you need to be ready for it.
Curiosity and open-mindedness are the path forward: The combination of these two characteristics, whether in a personal or professional context, ensures that you’re always on the right track toward becoming a better version of yourself.
Take measured risks: Understand what risk is and figure out how much of it you want in your life.
Learn from your losses, and celebrate your wins: As you progress in life, you’ll inevitably make many wrong turns and yet more right ones. Acknowledge the outcome of these turns and make them count.

Want to jump into the details? I wrote a full article about it! (Isn’t that what you do when you turn 30?)

Out of office: The 100-page gems of The Do Book Co

On a recent trip to Vienna (highly recommended!) I stumbled upon The Do Book Co’s “pocket guides” at a cool concept store called Calienna. These tiny books (around 100 pages long on average) are all written by subject matter experts and come in an ideal format: long enough to cover a complex topic and deep-dive into it, but not a big time commitment like a 400-page book.

The first two I read are:

Do Deal by Richard Hoare & Andrew Gummer: Fantastic advice and insights that cover all the stages of negotiations (in any given context).
Do Start by Dan Kieran: This one was so good that it deserved a Twitter/X post. If you're interested in working at startups or starting your own, this book is a must-read. Dan covers all the key areas of entrepreneurship while providing valuable insights and telling the inspiring story of Unbound.

They have a ton of other guides on all sorts of topics, so check them out!

If you enjoyed this issue of Data Espresso, feel free to recommend the newsletter to people in your entourage.

Your feedback is also very welcome, and I’d be happy to discuss one of this issue’s topics in detail and hear your thoughts on it.

Stay safe and caffeinated ☕

Data Espresso

Espresso #8: Why the data space struggles with standardization + Reflecting on turning 30

Barriers to standardization in the data space

Out of the comfort zone: 5 learnings from my twenties

Out of office: The 100-page gems of The Do Book Co

Discussion about this post