Espresso #12: Data modeling for data products, a Spotify Wrapped for everything, and building things that matter
A modern playbook for data modeling in a product-driven world, how AI can power a supercharged Spotify Wrapped, and a two-step formula for building valuable data products.
Hello data friends,
This month, we’re diving into the big comeback of data modeling (and how to adapt it to today’s data-product-driven world), the data product we all need (analytics and insights combining data from all the apps we use), and how to find data problems worth solving. So, without further ado, let’s talk data engineering while the espresso is still hot.
Data Modeling for Data Products: A Practical Guide
For the last couple of years, it feels like we’ve all time-traveled back to 1999. Nearly everyone in the data space is talking about Kimball, Inmon, and Data Vault again. This isn’t just nostalgia; it’s a very reasonable (and much-needed) reaction to the chaos of the last decade, where we threw endless compute and thousands of dbt models at every problem without a coherent strategy. After data budgets got much tighter in 2022, we all collectively realized that we needed a process to clean up the mess.
But here’s the tricky part: while we absolutely need the discipline of modeling, we can’t just copy-paste the old playbook. The slow, rigid, waterfall approach that defined the data warehousing era is not a great fit for modern data teams trying to efficiently build and ship data products at scale.
So, how do we bridge that gap? How do we take the timeless principles of good modeling and adapt them for a decentralized, product-driven world?
I decided to write down my full thoughts on this in a long-form article for Data Engineering Things. It’s a practical guide that lays out a modern playbook for data modeling, covering (among other things):
A "Go Wide, then Go Deep" strategy for balancing big-picture business alignment with the focused work of building a specific data product.
Why decentralized ownership isn’t just a buzzword, but a prerequisite for making this approach work.
The specific tools and frameworks (Metric Trees, the Semantic Layer, etc.) that help turn these principles into practice.
It’s my take on how we move forward: keeping the good parts of modeling without the old-school baggage. If you’re grappling with these questions on your own team, I think you’ll find it useful. (You can read the full article here.)
A Spotify Wrapped for Everything Else
Every year, Spotify Wrapped comes out and provides users with a wide range of analytics about the music they listened to throughout the year - and unsurprisingly, people love it. It’s a genuinely great data product that gives you a fun, insightful look at your own habits. But it’s also a rare exception. We use tens (hundreds?) of apps and services on a daily basis, yet we actually get back very little data from them.
What were my most productive hours according to my calendar and code commits? How did my reading habits on Kindle correlate with my running activity on Strava? What does my Uber history say about my social life? Some apps do sprinkle high-level metrics here and there, but there are mountains of personal data that are yet to be used for anything other than ads.
Getting the answers today would mean wrestling with a dozen different APIs and trying to stitch it all together myself. Unfortunately, but unsurprisingly, nobody has time for that.
This is where I think the current AI wave gets interesting. Connecting the dots across various services and generating useful insights about our habits and who we are is a very interesting "last-mile" problem for AI. The idea of a personal AI agent that can securely connect to the APIs of all the tools I use, pull the data, and just tell me something interesting about myself feels... actually useful.
It’s not about creating more dashboards. It’s about getting a coherent narrative out of the fragmented data of our own lives. That’s a data product I’d actually want to use.
(This is after all a data engineering newsletter, so I can’t talk about Spotify Wrapped without mentioning the fascinating article about the data engineering magic behind it.)
Finding Data Problems that are Worth Solving
For a myriad of reasons, data teams built a reputation for being disconnected (to an extent) from the business. The narrative is that we get excited about the tech and the new tools - and sometimes lose sight of whether we’re actually having an impact. (i.e. we end up building things that are technically impressive but practically useless.)
Recently I came across a great article by Sven Balnojan called "Why Internal Data Teams Build the Wrong Things", in which he provides 15 extremely relevant and useful learnings to ensure you don’t walk the wrong path.
But IMO we can distill things even further. In my experience, breaking out of this cycle and building things that are valuable for the business can come down to a simple, two-step path.
First, you have to find the low-hanging fruit to win some initial momentum. There’s always a team that is obviously starved for data. A classic example is the marketing team trying to figure out which campaigns are actually working. Building them a solid attribution model is a clear, contained project with an obvious business value. It’s a quick win that builds trust and opens new doors with other stakeholders who may have yet more interesting use cases.
But the real magic happens with the second step: truly understanding the business by diving into the details, and finding new (relevant) data use cases. This means going beyond shipping what’s obvious/requested, and instead watching how the business operates and identifying gaps where the data can shine.
A perfect example is product analytics. Most product teams have a tool like Amplitude or Mixpanel that gives them great high-level metrics. But the real gold (the raw, granular event stream) is often just sitting untouched in an S3 bucket or a production database. The high-value data product here isn’t about replacing the existing tool, but building a much richer analytics layer on top of that raw data. By modeling it properly, the data team can unlock granular, feature-level insights that are impossible to get otherwise. This is how you empower the product team to go from asking "How many active users do we have?" to "How does the adoption of our new search filter impact 30-day retention for our enterprise customers?" while also building a foundation for a myriad of new vertical use cases for product data.
These are the kinds of opportunities you only find when you go looking for them - by observing how the business actually runs and identifying the real-world friction that data can solve.
Hope you enjoyed this edition of Data Espresso! If you found it useful, feel free to share it with fellow data folks in your network.
Feedback is always appreciated as usual, so feel free to share your thoughts in the comments or reach out directly – I’d love to hear your take on the edition’s topics!
Until next time, stay safe and caffeinated ☕