Apr 26, 2021
Nice article! One point on the data-versioning section: “There are problems with these git-style solutions though. Git does not scale to store a large volume of data.”
You will be interested to check out the lakeFS project which does allow for performant git-style operations over large datasets.
As you mention, differences between file contents is a limitation. However via lakeFS commits you could answer the time-travel question for a feature.