Espresso #11: AI as a "last-mile" enabler, dbt Fusion engine, and navigating the thin layer between fact and BS
How AI is finally democratizing the Data Platform’s “last-mile” layer, dbt’s new Fusion engine, and why a healthy dose of skepticism is always needed when it comes to data.
Hello data friends,
This month, we’re diving into how AI is finally democratizing the Data Platform’s “last-mile” layer, dbt’s new Fusion Engine, and why a healthy dose of skepticism is always needed when it comes to data. So, without further ado, let’s talk data engineering while the espresso is still hot.
How AI is finally paving the data platform’s "last mile" for everyone
We’ve all seen the incredible advancements in the data space over the past years – powerful warehouses with limitless scale, slick ELT tools, and dbt bringing software engineering rigor to data pipelines. Yet, that final layer of polish, the seamless “last mile” experience seen in Big Tech tools (like Airbnb’s data timeliness UIs or Netflix’s integrated notebooks), still feels out of reach for most data teams, often a luxury only Big Tech could afford. Even with all the capabilities of the Modern Data Stack, data teams are still swamped with firefighting and ad-hoc requests, leaving little room for building these bespoke experience layers.
But what if that’s changing?
In my latest article, "How AI is Finally Democratizing the Data Platform’s Last-Mile Layer", I dive into how AI is emerging as a powerful “last-mile enabler”. I experienced this firsthand via a personal experiment that I discuss in the article: I persisted the artifacts of a dbt project using the ‘dbt-artifacts’ package, and then using Replit's AI, I built a basic dbt run execution timeline visualizer in under one hour – a task that previously might have seemed daunting (especially when using something as “delicate” as D3.js). Sure, it wasn’t nearly as sophisticated as Airbnb’s internal platform, but it worked. I had a functional app showing dbt model lineage, run statuses, and execution times, built by one person, in less time than it takes to watch a movie.
This isn’t just about building cool internal tools faster. It’s about:
Democratizing Excellence: AI is lowering the barrier of the experience layer, allowing teams of all sizes to achieve the kind of platform sophistication once reserved for tech giants.
Boosting Productivity: Enabling data professionals to focus more on insights by spending less time wrestling with clunky interfaces, hunting for metadata, or navigating complex access procedures.
A Richer Ecosystem: Potentially fostering more collaboration and a vibrant open-source landscape for these “last mile” components (imagine all the dbt packages waiting to be built!).
Of course, AI isn’t a magic wand (for now at least), and I touch upon the necessary cautions. But the potential for AI to help us finally build truly complete, user-centric data experiences is immense.
Want to explore the Big Tech examples, the “how-to” with AI, and what this means for the future of data platforms? Check out the full post on Medium here.
New dbt engine, who dis?
So, dbt Labs finally shared their detailed roadmap for the post-SDF-acquistion world, and it’s pretty exciting stuff! The star of the show was the new Rust-based dbt Fusion engine. This is a big deal for quite a few reasons: Immense performance improvements compared to dbt Core (up to 30x faster parsing), smarter SQL understanding for things like real-time code validation, and state-aware orchestration that can actually cut down your warehouse compute costs.
The new engine will co-exist with the Python-based dbt Core engine (which dbt Labs will continue to maintain), but will understandably have the more restrictive ELv2 license (you can use it internally but can’t just take it and build a competing SaaS).
Maintaining the two engines separately is a smart strategy. The dbt language stays consistent, which is key for “universal” dbt concepts. If you're already running dbt Core, you can continue as is, or you can choose to adopt the ELv2 Fusion engine for the performance boost without worrying about migration costs.
dbt Labs announced a wide range of dbt Cloud features that augment the new engine and provide impressive quality-of-life improvements to the dbt experience, like a supercharged VS Code extension and new/improved components like dbt Canvas, dbt Insights, and an expanded dbt Catalog.
In my opinion, dbt Labs pulled off something pretty impressive. Throughout the early years of dbt, the “distance” between dbt Core and dbt Cloud was relatively minimal, making it difficult to justify the cost of the “upgrade” from Core to Cloud. However, with this release (and the additional updates made throughout the past two years to areas such as the semantic layer), the value proposition of moving to dbt Cloud is stronger than ever. They’ve finally created that necessary distance between Core and Cloud to warrant internal discussions about upgrading for most data teams.
Although the open-source offering is still part of the picture, the big focus on dbt Cloud features means teams using dbt Core (or even the new Fusion engine) will inevitably start to wonder why they’re not using SQLMesh instead, which continues to champion a more “compelling”, feature-rich, open-source-first narrative.
A shot of skepticism: thinking critically about our data conclusions
As data professionals, we’re immersed in extracting insights and building data-driven solutions (or at least we like to think so). But this very closeness to the data can sometimes lead to a subtle trap: we might become a bit too quick to believe we’re drawing the right conclusions, simply because the data “said so”.
I recently read a fantastic book (which I bought a few years ago and forgot about) that serves as an effective reminder of the dark side of data: “Calling Bullshit: The Art of Skepticism in a Data-Driven World” by Carl T. Bergstrom and Jevin D. West. While it tackles broader themes of misinformation, its lessons are incredibly relevant for us data folks. (And it’s a fun read!)
The book highlights how easy it is for anyone—even experts—to be misled if we’re not rigorously questioning our inputs and interpretations. For us in the data world, this means constantly asking:
Are we truly asking the right questions to begin with?
Have we considered all potential confounding factors or biases in the data?
Are we diligently distinguishing between mere correlation and actual causation?
Even with sophisticated tools and vast datasets, the critical thinking we apply—the art of healthy skepticism towards our own findings—is as important as ever. If you’re looking to sharpen your internal “bullshit detector” and ensure your data narratives are as robust as your pipelines, this book is a worthwhile addition to your reading list.
Hope you enjoyed this edition of Data Espresso! If you found it useful, feel free to share it with fellow data folks in your network.
Feedback is always appreciated as usual, so feel free to share your thoughts in the comments or reach out directly – I’d love to hear your take on the edition’s topics!
Until next time, stay safe and caffeinated ☕