Paul Singman
Apr 26, 2021

--

Nice article! One point on the data-versioning section: “There are problems with these git-style solutions though. Git does not scale to store a large volume of data.”

You will be interested to check out the lakeFS project which does allow for performant git-style operations over large datasets.

As you mention, differences between file contents is a limitation. However via lakeFS commits you could answer the time-travel question for a feature.

--

--

Paul Singman
Paul Singman

Written by Paul Singman

Data @ Meta. Whisperer of data and productivity wisdom. Standing on the shoulders of giants.

No responses yet