Open in app

Sign In

Write

Sign In

Paul Singman
Paul Singman

1.3K Followers

Home

About

Published in Whispering Data

·Feb 10

Scaling AWS Redshift Concurrency with PostgreSQL

The most efficient way to move data between an analytics warehouse and an OLTP data store! — INTRODUCTION On the surface, it would appear there is not much similar between Redshift — AWS’s cloud data warehouse service launched in 2012 — and Postgres — one of the most popular open source databases first introduced in 1989. One is optimized for analytic workloads (Redshift); the other performs better when…

Data Engineering

7 min read

Scaling AWS Redshift Concurrency with PostgreSQL
Scaling AWS Redshift Concurrency with PostgreSQL
Data Engineering

7 min read


Published in Whispering Data

·Jun 27, 2022

The State of Data Engineering 2022

All the latest tools and trends in data engineering. — Note: This article was originally published by Einat Orr on June 20th, 2022. Introduction A year has passed since we shared the State of Data Engineering 2021. And since we released that article last May, not much has changed in the data landscape. …

Data Engineering

10 min read

The State of Data Engineering 2022
The State of Data Engineering 2022
Data Engineering

10 min read


Published in Whispering Data

·Jun 13, 2022

5 Tips For a Tidy Data Warehouse

Spark joy in your data warehouse by following these data modeling best practices! — Introduction Building the right data models in your data warehouse can have a huge impact on how much value you get from your data. It can make the difference between a report or analysis taking a week of analyst time vs a day. When data models are organized and clearly reflect…

Data Science

6 min read

5 Tips For a Tidy Data Warehouse
5 Tips For a Tidy Data Warehouse
Data Science

6 min read


Published in Whispering Data

·May 6, 2022

Towards Effective DataOps

Gain the confidence to mess with your data without making a mess of your data. — “If it hurts, do it more often.” is a wise piece of advice that DevOps engineers often repeat. Unless you are a masochist, following this advice will naturally lead you to finding ways to make the process being repeated less painful. In the world of DevOps, these processes are typically…

Data Science

5 min read

Towards Effective DataOps
Towards Effective DataOps
Data Science

5 min read


Published in Whispering Data

·Mar 21, 2022

Building a Personal Data Stack to Alert on Crypto Price Fluctuations — Trying Out Hex and Delta Lake

If you’re like me, you bought your first cryptocurrency in the past year or so, right when it stopped going up in price and making random people millionaires. Now you are stuck owning this “stuff” and are in a constant state of worry you’ll wake up one morning to find…

Data Science

7 min read

Building a Personal Data Stack to Alert on Crypto Price Fluctuations — Trying Out Hex and Delta…
Building a Personal Data Stack to Alert on Crypto Price Fluctuations — Trying Out Hex and Delta…
Data Science

7 min read


Published in Whispering Data

·Feb 22, 2022

Level Up Your Data Lake

Take your data lake game to new heights with these two architecture improvements. — What is the Basic Data Lake? A data lake is primarily two things: an object store and the objects being stored. It might look something like this:

Data Engineering

4 min read

Level Up Your Data Lake
Level Up Your Data Lake
Data Engineering

4 min read


Published in Whispering Data

·Feb 7, 2022

How Easy It Is to Re-use Old Pandas Code in Spark 3.2?

In October, it was announced that the Pandas API was being integrated with Spark. This is particularly exciting news for a Pandas-baby like myself, whose first exposure to data exploration involved following tutorials using Jupyter notebooks and Pandas DataFrames. — Spark 3.2 has been out for several months now and a curiosity has been building inside me — how easy it is to take existing pandas code and copy it as is into Spark? …

Data

6 min read

How Easy It Is to Re-use Old Pandas Code in Spark 3.2?
How Easy It Is to Re-use Old Pandas Code in Spark 3.2?
Data

6 min read


Published in Whispering Data

·Jan 29, 2022

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino

Let’s put the bagel to use by querying branched lakeFS data from Trino’s distributed engine. — Introduction Dockerize Your Data Pipeline I can remember times when my company started using a new technology — be it Redis, Kafka, or Spark — and in order to try it out I found myself staring at a screen like this: At the time I thought nothing of doing this. And even wore it as…

Data Engineering

5 min read

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino
The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino
Data Engineering

5 min read


Published in Whispering Data

·Dec 13, 2021

The Guide to Data Versioning

Already familiar with versioning code with git? A look at how it works to version data using the same abstractions. — “I have never lied to you, I have always told you some version of the truth.” “The truth doesn’t have versions, okay?” — Something’s Gotta Give (2003)

Data Engineering

10 min read

The Guide to Data Versioning
The Guide to Data Versioning
Data Engineering

10 min read


Nov 10, 2021

3 Ways to Add Data to lakeFS

Few people start using lakeFS without first having some data collected. Consequently, it is common that after getting it up and running, one of the first things people do is import their existing data to lakeFS. — There isn’t a one-size-fits-all approach for doing importing data. Instead, there are ways that work great for a single file, and some that are designed to handle millions of them. Let’s walk through, in detail, how it’s done for each situation.

Data Engineering

7 min read

3 Ways to Add Data to lakeFS
3 Ways to Add Data to lakeFS
Data Engineering

7 min read

Paul Singman

Paul Singman

1.3K Followers

Data @ Meta. Whisperer of data and productivity wisdom. Standing on the shoulders of giants.

Following
  • Lauren Balik

    Lauren Balik

  • Kidong Lee

    Kidong Lee

  • Scott Haines

    Scott Haines

  • Giorgos Myrianthous

    Giorgos Myrianthous

  • Rodrigo Tovar Jacuinde

    Rodrigo Tovar Jacuinde

See all (83)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech