Nice article! Using mostly the same tools and agree with your assessments. I'm trying to implement a generalized ML model testing framework at my company.
The plan is to include a ModelVersionNumber=1.0 field in the API response when our app fetches recommendations.
This field should then be captured as a property of the analytics events we track (via mParticle) so I can run analysis on the raw clickstream data (or in Amplitude) on the ModelVersionNumber datapoint.
Anyways, I know you're also mostly getting started with this stuff but curious to hear if that makes sense to you or happy to chat in general!