A Containerized Data Lake

Pachyderm lets you store and analyze your data using containers.

Pachyderm is a data lake that offers complete version control for data and leverages the container ecosystem to provide reproducible data processing.

Version control for data

Pachyderm version controls all your data, similar to what Git does with code. You can view diffs of your data and collaborate with teammates using Pachyderm commits and branches. Learn more →

feature1

Unlock the power of containers for big data

Containers are awesome for data analysis. They're insanely easy to use, easy to manage, and let you develop locally knowing your analysis will also run seamlessly in the cluster. Learn more →

feature2

Why Pachyderm?

Because data scientists should expect more from their data infrastructure

services1
Reproducibility

Consistently recreate results from any previous state of your data or analysis.

services2
Data Provenance

Understand every step of the process that produced a given result.

services3
Collaboration

Manage shared data resources and work more effectively as a team.

services1
Incrementality

Build upon past results by only processing the new data for maximum performance.

services2
Data Scientist Autonomy

Maintain complete control of your data science toolchain choices.

services3
Infrastructure Agnostic

Run in the cloud or on-premise and integrate easily with your current infrastructure.

As featured in: