Discussion about this post

User's avatar
Brent Brewington's avatar

I’ve been a data analyst consumer in a large retail organization, and the EDW team provided the current state table, with the historical one suffixed with _HIST. It was pretty straightforward to use & understand, and once I started using dbt and actually running data tests, I helped them discover and debug some issues

Expand full comment
mike oneil's avatar

This is not a new idea at all. It is at least 30 years old. Maintaining a separate current state snapshot for the high proportion of analysts who only care about current state is a pattern consistent with minimising processing on constrained resources and even more so now when some cloud platforms charge for the entire set of rows in the table not the subset extracted. What has changed is the rise of Data Science driven feature engineering interested in variable windows applied to that history. A similar approach to creating a current snapshot can be applied to maintaining the outputs of feature engineering. If multiple Data Scientists require the same feature, why would you require them all to re-run that process.

Expand full comment
10 more comments...

No posts