Recording the path to your "AI Agent" responses
If you can't see what what path was taken, you can't safely experiment with that path
How did you get that answer?
As Nigel and I keep testing out different use cases for where our Ask ADI capability can help data professionals reduce the cognition and effort required to do complex data work we stumbled on an interesting pattern we both use.
Building the House while living in it
As we are using ADI in anger to do the data work for our AgileData.team Fractional Data Service customers and we are also extending out the capabilities in our AgileData.cloud data platform at the same time, we found we would often use ADI to help us with a task and then ask her “How did you get that answer”.
This would often be triggered with ADI giving us one of three responses:
Something that was right
Something that was wrong
Something that was unexpected
Something that was right
For these ones we used the answer and moved on to the next step in the data work, sing with joy that she is doing what we designed her to.
Something that was wrong
She would give as an answer that could politely be called hallucinating, in the data world I typically think of it as just being wrong.
In this scenario we always want to understand the path she took to return the wrong results. So we can figure out what we would change to make her more accurate in the future.
So we would ask her “How did you get that answer”.
Something that was unexpected
She would reply with something that made us go, wow that was unexpected, but actually bloody good, how the hell did she get that response?
Which meant we would next want to understand the logic path that she took to get that response.
Hence the follow up “how did you get that answer?” question.
In hindsight the pattern were doing was a form of eval, which we were doing repeatably but manually.
And then we started scaling it
As we tweaked the Google Gemini models we used, we tweaked what and how we stored Context in the Context Plane, we tweaked the prompts and reinforcement objects we stored in the Context Plane and made accessible to ADI etc, we decided she was good enough to be out in the hands of our AgileData.network partners.
To be clear they were well aware that she was still in an “discovery” mode, and not to be let lose directly on their customers without the partner in the loop, and was not ready to be given directly to the customers themselves.
But as one of our core AgileData principles is co-design, its how we can scale what we do so fast with just two co-founders. We know that putting patterns and features into the hands of our talented partners is a much faster way to iterate and scale.
WTF did they just try to do?
And of course you can imagine what happens when you put a fairly permissive AskAI capability into a data platform that covers the complete gambit from data collection to data consumption, and enables every data task in-between, into the hands of data practitioners who are by default very early adopters, who are working across multiple customers, in multiple industries and are trying to be at the edge if not bleeding on that edge ….
They started doing things that made us say, why the fook did they try and do that, what were they trying to achieve?
And more importantly, how the hell did ADI come up with that answer, and was it right or wrong, or excitingly unexpected?
Log everything
One of the patterns we apply to the AgileData.cloud is we log everything.
So of course we were already logging the question they asked ADI and the response they got.
But this didn’t help us work out the rest.
Manual processes don’t scale
So of course the first thing we tried, was taking the question they asked and typing it into our own AgileData Tenancies.
You can imagine what happened.
The data in our tenancies is different to theirs.
You typically wont get exactly the same answer to the same question with a LLM based model.
Epic fail and not a lot of use to help iterate ADI’s behaviour.
Not just “how many sales were there”
The natural language and non-deterministic behaviour of the Google Gemini models we use under the covers for ADI is where a lot of the value resides.
And this means she can help with a lot more data tasks than the typical “Text to SQL” use case every data vendors is chasing as table stakes these days.
And so the questions being asked and data work being done by our talented partners was a lot more than the simple “how many sales were there”.
Which meant we couldn’t just implement something like a judge pattern to make sure ADI was returning the correct number to each question.
Iterate with simplicity
As part of our Way of Working we alway try and decompose the work to be done into the smallest chunk possible and to start off with simplicity and add complexity later.
Automatically log what path was used
First thing we did was to extend the logging to include the path ADI took to get the answer, so we could review it after the fact.
Originally this was just logging in the background, but we found that this logic was actually useful to the Data Practitioner, so we surfaced it as part of the ADI response.
You can see the path being returned here in the language of as set of assumptions.
Ask for feedback
We still don’t know if the path used was actually the best path and if the response helped the partner do the data work quicker and easier, so we reuse the feedback pattern from social media products so they can quickly give us an up or down thumb for each response.
As our partners know they are co-designing with us, this quick and easy feedback loop provides value, without slowing them down from doing the data work and delivering value to the customer.
Should I trust the response
We also decided to experiment with providing a confidence score on each ADI response, we find this useful when evaluating the responses after the fact, it will be interesting to see this helps our partners or not.
Another Use Case
Here is another use case where we Ask ADI to help us model the data from Google Analytics.
Again we are logging the path ADI took to respond, but with slightly different language to make it fit the Context of the question more.
And ADI is providing a suggest next step and asking if she can help.
And as mentioned before the user will always go somewhere we don’t expect.
The point of these is not that the Gemini LLM provided definitions for commonly used Google Analytics concepts, any LLM will do that, you can just use ChatGPT on its own etc.
But the fact is we end up with full visibility of what was asked, what the answer was and who that answered was derived.
And with that visibility we can decide the most valuable use cases to iterate Ask ADI for, based on the things our AgileData.Network partners actually need them to help them with.
And one more thing
And you can see where we will take this.
Given ADI is embedded in the data platform that the data practitioner is doing the actual data work in, we can increase the level of assistance over time.
ADI should probably respond to the Google Analytics data questions, with a response that is tailored on how you do this work in the AgileData.cloud and use the patterns and language we use in that platform (Information Product Canvas, Concepts instead of Entities etc)
ADI should probably create the Concept Model for those Concepts.
ADI should probably populate the Business Glossary with the default definitions for those Concepts.
ADI should probably use the Context we have defined before for Google Analytics to differentiate between a Pseudo User and a User.
ADI should probably rehydrate the Change Rules (data transformations) needed to populate those Pseudo Users from thw GA4 event data.
ADI should probably …. [insert use case we see our partners do, that we have never thought about here]
Patterns you can adopt.
Here are some simple patterns you can adopt as you continue your “AI” journey in your organisation:
Log all questions asked and all responses given by your AskAI feature, somewhere you can see and query
Log the path your LLM took to provide the response
Find a way to gather feedback as you scale
Put it in the hands of your early adopters as soon as possible and let them help you co-design the most valuable areas to iterate with next.





