3 Ways to Add Data to lakeFS

Two Things to Know Before We Begin

  1. Your lakeFS endpoint URL — This is the address of your lakeFS installation’s S3 Gateway. If testing locally, it will likely be http://localhost:8000. If you have a cloud-deployed lakeFS installation, you should have a DNS record pointing to the server, something like lakefs.example.com . Know this value it’ll be used in several places.
  2. Your lakeFS credentials — These are the Key ID and Secret Key generated when you first set up lakeFS and downloaded a lakectl.yaml file. Or your lakeFS administrator should set up a user for you and send the key and ID.
~/.aws/credentials
.lakectl.yaml

Single Local File Copy (AWS CLI)

The Command

aws --profile lakefs \
--endpoint-url https://penv.lakefs.dev \
s3 cp ~/Downloads/customer_promo_2021-11.csv s3://my-repo/main/marketing/customer_promo_2021-11.csv

Configuration

[default]
aws_access_key_id=AKIAMYACTUALAWSCREDS
aws_secret_access_key=EXAMPLEj2fnHf73J9jkke/e3ea4D
[lakefs]
aws_access_key_id=AKIAJRKP6EXAMPLE
aws_secret_access_key=EXAMPLEYC5wcWOgF36peXniwEJn5kwncw32

Copy Data Without Copying Data (lakectl ingest)

The lakectl command line tool supports ingesting objects from a source object store without actually copying the data itself. This is done by listing the source bucket (and optional prefix), and creating pointers to the returned objects in lakeFS.

The Command

lakectl ingest \
--from s3://my-beautiful-s3-bucket/customer_promo_2021-11.csv \
--to lakefs://my-repo/main/marketing/customer_promo_2021-11.csv
lakectl ingest \
--from s3://my-beautiful-s3-bucket/customer_promos/ \
--to lakefs://my-repo/main/marketing/customer_promos/

Configuration

credentials:
access_key_id: AKIAJRKP6EXAMPLE
secret_access_key: EXAMPLEYC5wcWOgF36peXniwEJn5kwncw32
server:
endpoint_url: https://penv.lakefs.dev

Large-Scale Imports (lakeFS inventory imports)

The Comannd

lakefs import \
lakefs://my-repo \
-m s3://my-beautiful-s3-bucket-inventory/my-beautiful-bucket/my-beautiful-inventory/2021-10-25T00-00Z/manifest.json \
--config .lakefs.yaml
Inventory (2021-10-24) Files Read                     1 / 1    done
Inventory (2021-10-24) Current File 1 / 1 done
Commit progress 0 done
Objects imported 1 done
Added or changed objects: 1
Commit ref:3c1e4222cf2ac89a5c3a9fdd99d106f8bf225e2a17ac013ffae6d19f844420d0
Import to branch import-from-inventory finished successfully.
To list imported objects, run:
$ lakectl fs ls lakefs://my-repo@3c1e4222cf2ac89a5c3a9fdd99d106f8bf225e2a17ac013ffae6d19f844420d0/
To merge the changes to your main branch, run:
$ lakectl merge lakefs://my-repo@3c1e4222cf2ac89a5c3a9fdd99d106f8bf225e2a17ac013ffae6d19f844420d0 lakefs://my-repo@main

Creating an S3 Inventory

Configuration

./config.yaml
$HOME/lakefs/config.yaml
/etc/lakefs/config.yaml
$HOME/.lakefs.yaml

Wrapping Up

Still have questions about data and lakeFS?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store