#04 - Track 'N' Trace
Why do you need to use a complex data lineage graph to work out where specific data goes in your data platform? When you can track n trace that data just like you track and trace your courier parcel
What
Track n Trace provides end-to-end visibility of your data journey, allowing you to trace every step from source to consumption with ease, as if you were tracking the delivery of a physical package.
This enables Data Personas to understand where data originates, how it transforms, and where it is used, ensuring full transparency and a reliable audit trail. Whether for troubleshooting, compliance, or optimisation, Track n Trace delivers the clarity needed to manage your data flows effectively.
With its simply magical interface, Track n Trace empowers data teams to maintain trust in the data and streamline their data operations with ease.
Feature Requirement
The system must provide capabilities to track and trace data lineage for a specific data value, enabling users to follow the flow of data through various transformations, sources, and destinations within the platform.
Requirement Rationale
Tracking and tracing data ensures transparency and accountability in data processing, supports compliance with regulatory requirements, and enhances in data by allowing users to understand the origin, transformation, and usage of their data.
How
Why
This feature was built out of my frustration with the typical pattern for tracking and understanding lineage of data.
One of the benefits of working in a dedicated data team for an organisation, or only working for one customer, is over time you get tacit knowledge of all the data flows. If you want to understand how the data is or isn’t flowing through the data platform, you already have an idea and you typically just want to validate your understanding.
If you are a Fractional Data Team, you are often switching Customer context on a regular basis and that increases the level of data cognition you need. You have to tune into that Customers data flows so to speak, or “see the matrix”.
Often part of that cognition is the need to trace the lineage of a particular data flow for a Customer.
It might be for the dreaded statement from a Stakeholder, “that number doesn’t look right” which requires you to trace it from the Consume Tile the Last Mile tool is using, back to the Change Rules that are applying business logic to the data and potentially all the way back to the History Tile and System of Capture that generated that data.
It might be to trace the impact of a Systems of Capture change that you have been told is happening in the future, or more likely has happened and broken the current Data Contract. So you need to trace forward to see all the transformation logic that relies on that data and the Consume Tiles that use it.
Or you need to investigate the effort to “just make a small tweak to the business logic on how we calculate xxx” and you need to start in the middle of your data flow and move both left and right at the same time.
One of the first features we built in the AgileData App, was the Data Map, our version of the typical lineage graph. You can of course click on a Tile in the map and see the lineage both forward and back to visualise the flow of data. From there you can simply view the Change Rules that transform the data or view Tiles that hold the data as a result of each transformation step.
But if you are trying to trace an individual column this process is far from magical.
We could have implemented a column level view of the Data Map, but that would have resulted in a complexity of lines and flows I like to call “mad persons knitting”. And I would still have to click on each node or line to understand what was actually going on within that Data Flow.
One of the things we often do to understand the flow of the data is to try and trace a single data value. For example, pick the Id of an order in the System of Capture, or the name of a Supplier and trace that value through the Data Flows.
And that led us to the development of Track N Trace.
You enter a data value for a specific Tile and it will return with every Tile in the Data Map that is related to that Tile and has that specific value in it. And it will tell you the number of times that value appears in each dependent Tile.
It seems like a simple process, but as always creating simplicity from complexity is a bit like a duck floating on top of the water while peddling madly under the water.
Here is what the AgileData Platform is doing under the covers to achieve this magic:
You start with a Tile in the AgileData App and you search for a specific data value.
It automagically traverses the Data Map in the background to harvest each transformation rule that is dependent on the Tile you started with.
It identifies each target Tile that each of the identified rules populate.
It then searches all the data in each of those Tiles for that data value.
It returns the name of the Tile and the number of times the data value appears as a simple list
You can click on any of the Tile names to drill through to the Change Rule that populates that Tile.
With our Track N Trace feature we adopted the same pattern you use when you are tracking the delivery of a physical parcel. It’s just a little faster and a lot more reliable.
Again this feature saves me minutes every time I need to understand the lineage of the data. And when you do that data task multiple times a day, those minutes saved all eventually add up to hours.
How does it respond to the situation where the exact same data value exists in different tables?