Join Shane Gibson as he chats with Roelant Vos about a number of Data Engine Thinking patterns and his new book Data Engine Thinking.
Listen
Listen on all good podcast hosts or over at:
https://podcast.agiledata.io/e/data-engine-thinking-patterns-with-roelant-vos-episode-66/
Read
Read or download the podcast transcript at:
https://agiledata.io/podcast/agiledata-podcast/data-engine-thinking-patterns-with-roelant-vos/#read
But the Data Engine Thinking Book
https://dataenginethinking.com/en/
Google NoteBookLM Briefing
Detailed Briefing Document: "Data Engine Thinking: Automating the Data Solution"
Source: Excerpts from "Data Engine Thinking: Automating the Data Solution" – A podcast interview with Roelant Vos (co-author) by Shane Gibson.
Date: June 19th, 2025
Key Speakers:
Roelant Vos: Co-author of "Data Engine Thinking," experienced data professional (25+ years), with a strong focus on automation and technical data management.
Shane Gibson: Host of the Agile Data Podcast.
1. Core Concept: "Data Engine Thinking" and the "Information Factory"
The central theme of the book, "Data Engine Thinking," and the podcast discussion, is an end-to-end approach to building data solutions that are inherently "designed for change." This is contrasted with traditional, often manual, data practices.
Designed for Change: The fundamental goal is to create data solutions that can easily adapt to evolving business needs, data models, and technological landscapes. This is achieved primarily through automation and pattern-based design.
Information Factory vs. Information Value Stream: Shane Gibson introduces two concepts:
Information Value Stream: Focuses on the "product thinking" side – identifying problems, ideating solutions, prioritising work, and delivering value to stakeholders.
Information Factory: Focuses on the "platforms that support that work, and the way we move data through it all the way from collection through to consumption." Roelant confirms that "Data Engine Thinking" is "definitely more in the information factory area." This highlights the book's focus on the underlying architecture and automation capabilities rather than business process design.
2. The Imperative of Automation and Lowering the Cost of Change
A recurring and foundational idea is that automation is crucial for agility and innovation in data.
Historical Context: Roelant's journey over 25 years has consistently focused on automation, starting as early as 2000 with tools like Oracle Warehouse Builder and Tickle. He notes, "Absolutely. Yep. So I started that at 2000 actually... Working with Oracle Warehouse Builder made rest in peace. We used this stickle language to try to automate things."
Enabling Experimentation and Risk-Taking: Shane eloquently summarises the core benefit: "if the cost of change is lower, then we are more willing to change... We can manage more, change more often. We can iterate more often. We can take more risks. We can make earlier guesses because we know the cost of refining that guess in the future is lower than if everything was manual."
Addressing Flawed Models: Data models are "always going to be flawed" because initial interpretations of reality are incomplete. As understanding grows, models need refinement. Automation makes this refinement cost-effective: "The more you learn, the more you want to refine that model. And then you have to go back every time to update your code base. And that's why automation is such a critical thing."
Overcoming Complexity: Data solutions are described as "not complicated, but they're complex," due to "a lot of tiny moving bits, pieces." Automation is the key to managing this inherent complexity.
3. Pattern-Based Design: Design Patterns vs. Solution Patterns
A core distinction made in the book is between "design patterns" and "solution patterns," which provides a structured approach to building robust data systems.
Design Patterns (The "What"): These are conceptual, holistic, and technology-agnostic. They define "what do we need to do, how should work, what are sort of conceptual boxes that we tick." Examples include:
Historizing Data: The need to capture and store every historical view or instantiation of data over time. This is considered mandatory: "you always want to bring in a historicization pattern into your platform on day one... because you are gonna need it."
By-temporality: The complex concept of managing both "assertion time" (technical timestamp) and "state timeline" (business validity). Roelant states, "you have to have these two timelines in place at all times and solve the problems associated with it, like the bytemporal."
Reference Data Management: The ability to capture and iterate lookup data not originating from source systems (e.g., Excel, Google Sheets).
File Drop Capabilities: The ability to ingest ad-hoc files (CSV, JSON, XML, Excel) into the platform.
Solution Patterns (The "How"): These are specific implementations of design patterns on a given technology stack. They involve the concrete choices of how a design pattern will be realised.
Examples: For historizing data, solution patterns could be SCD Type 2 dimensions, Data Vault satellites, or specific methods in DBT, Oracle Warehouse Builder, SQL, Python, etc.
Technology Agnostic Design: The goal is that the core design pattern (e.g., historization) remains constant, but the "physical modeling can change depending on the technology, the user, the tools, all those constraints." The "technology becomes less of a... Yes, you need to optimize it, but it's also not really where the IP resides."
Mandatory Design Patterns: Roelant argues that certain design patterns are "mandatory" because "to truly work with data in your organization, there's no avoiding them. Sooner or later, you will run into these problems, so you might as well tackle it upfront."
Bulletproofing and Reusability: The aim is to create "bulletproof" and "reusable" code for common solution patterns, similar to how Data Vault hub/link/satellite loading code can be hardened. This reduces manual effort and increases reliability.
4. The "Engine" Concept and Future State
The book envisions a highly automated "Data Engine" that can intelligently manage and optimise data solutions.
Optimizer Component: A key component of the "engine" is an "optimizer." This allows the system to "calculate based on use patterns, what the best combination of physical, virtual objects are to deliver that" based on directives like "cost storage, IO latency." This means dynamically choosing whether to virtualize or physicalize data based on specific performance or cost requirements.
Config-Driven Platforms: The ideal future state is a fully configuration-driven, end-to-end platform where users can "pull up the design patterns, you tick some boxes for them, pull up the constraints around which technologies it's allowed to use, and it actually writes the solution patterns and the code and deploys itself."
Virtualization as a Test: Roelant suggests that virtualization, even if not physically implemented, serves as a test of the robustness of patterns: "If you can do it [virtualize], you can also physicalize... but if you can't virtualize it. Then something's wrong with patterns."
5. Narrative Approach and Practical Application
The authors employ a specific narrative strategy to make the complex concepts more accessible and actionable.
Fictitious Company ("FCOM"): The book uses a fictitious company to embed theory within a story. This "takes that emotional aspect and put it into the book storyline so we can have that conversation about pros and cons and what it means." This approach echoes books like "The Goal" and "The Phoenix Project," where patterns are revealed through a narrative.
Practical Implementation: The book is designed to be highly practical, allowing readers to "implement a fully automated solution, uh, themselves."
GitHub Repository: Complementing the book, a GitHub repository will be launched with "all these code examples and patterns and things you can run yourself, and templates and everything." This aims to foster collaboration and provide concrete implementation examples.
6. Challenges and Learnings in Writing the Book
The podcast touches upon the significant effort involved in synthesising 25+ years of experience into a coherent framework.
Seven Years in the Making: The book has taken "almost seven years" to write, highlighting the complexity of the undertaking.
Collaborative and Harmonious Process: Despite co-authorship, Roelant notes remarkable agreement between himself and Dirk, leading to a "very harmonious" writing process. Disagreements were rare; instead, there were moments of "didn't know something yet" leading to further exploration and coding.
Valuable Role of Training: Creating course material and training people before writing the book proved invaluable. Roelant states, "at some point I started recording the trainings for that purpose. And then after the training sessions, I was updating the notes in the slide decks to find the right words. And that all made it into the book." This iterative process of teaching, getting feedback, and refining content directly informed the book's clarity and structure.
7. Antipatterns and Dogmatism
The discussion subtly highlights issues in current data practices that "Data Engine Thinking" aims to address.
Handcrafting vs. Automation: A major antipattern identified is the tendency for data professionals to "like to handcraft things oddly." This is contrasted with the software domain's adoption of templated, automated deployments (e.g., Terraform).
Reinventing the Wheel: When changing technology stacks (e.g., SQL Server to Snowflake to Databricks), data teams often "reinvent all the patterns again... That's just crazy." The book seeks to provide a shared, reusable library to combat this waste.
Avoidance of Complexity: Data professionals can be "a bit shy of exposing too much complexity if they can avoid it," opting for "simpler patterns" that may cause issues later. The book argues for upfront embrace of necessary complexity, managed through automation.
Dogmatic Approaches: The shift away from rigid adherence to single methodologies (e.g., Inmon vs. Kimball, Data Vault vs. Dimensional) is acknowledged, promoting flexibility based on context.
Conclusion:
"Data Engine Thinking" proposes a paradigm shift in data solution development, moving away from manual, ad-hoc, and project-specific builds towards automated, pattern-based, and inherently adaptable systems. By clearly defining "design patterns" (the what) and "solution patterns" (the how), and advocating for their codification and reusability, the book aims to lower the cost of change, increase agility, and ultimately build more trustworthy and future-proof data platforms. The authors' extensive experience and practical, code-backed approach suggest a significant contribution to standardising and industrialising data engineering practices.
«oo»
Stakeholder - “Thats not what I wanted!”
Data Team - “But thats what you asked for!”
Struggling to gather data requirements and constantly hearing the conversation above?
Want to learn how to capture data and information requirements in a repeatable way so stakeholders love them and data teams can build from them, by using the Information Product Canvas.
Have I got the book for you!
Start your journey to a new Agile Data Way of Working.