Explainable. Repeatable. Scaleable.


Announcing Pachyderm 1.10
Featured Article

Announcing Pachyderm 1.10

We spent 2019 on the road, going to conference after conference and talking to thousands of data scientists, developers, and MLOps engineers. But more than anything, we listened. We wanted to know the struggles and challenges that data science teams face every day. After listening to various use-cases and learning about their struggles to make things work, we continue to see that data science is not owned by individual contributors; it’s a collective responsibility. Pachyderm was always a platform focused deeply on collaboration but with version 1.10 we looked to build on that rock-solid foundation and make collaboration a way of life from end to end.

Read full article
Previous Posts
Pachyderm 1.10 adds integration support for JupyterHub

Pachyderm 1.10 - Support for Jupyterhub

Pachyderm 1.10 makes it super easy to connect with JupyterHub. Seamless Single Sign On and a smoothly scripted way to deploy Hub and Pachy together will have your data science team more productive than ever.

Read Full Article
Pachyderm 1.10 adds support for Kubeflow

Pachyderm 1.10 - S3 Gateway Expansion & Kubeflow Support

Pachyderm 1.10 delivers Kubeflow support you can count on. Leverage Pachyderm’s powerful data lineage platform with TFJobs (or any other Kubeflow run) directly within the Kubeflow ecosystem.

Read Full Article
A quick intro to the Pachyderm Shell.

Pachyderm 1.10 - Introducing Pachyderm Shell

Data science is hard enough without having to pick up a complicated new command line. With the 1.10 release, we give you Pachyderm Shell, delivering time-saving auto-completion, combined with helpful suggestions displayed right in the prompt.

Read Full Article
Pachyderm: Hub Elephant

Say Hello To Pachyderm: Hub

Hosted and managed Pachyderm for those who want data lineage, without the hassle of managing infrastructure

Read Full Article
Diagram of Pachyderm Hub Architecture on Gcp

(Kubernetes as a Service) as a Service

What it's like to build kubernetes-as-a-service, as a service.

Read Full Article
View all


For those who like to learn while doing.

OpenCV (aka "Hello World")

This tutorial walks you through the deployment of a Pachyderm pipeline to do simple edge detection on a few images.

View on GitHub

ML Pipeline for Tweet Generation (gpt-2)

In this example we'll create a machine learning pipeline that generates tweets using OpenAI's gpt-2 text generation model.

View on GitHub

Distributed hyperparameter tuning

This example demonstrates how you can evaluate a model or function in a distributed manner on multiple sets of parameters.

View on GitHub


This example connects to an IMAP mail account, collects all the incoming mail and analyzes it for positive or negative sentiment, sorting the emails into directories in its output repo with scoring information added to the email header "X-Sentiment-Rating"

View on GitHub

Mnist with TFJob and Pachyderm

This example uses the canonical mnist dataset, Kubeflow, TFJobs, and Pachyderm to demonstrate an end-to-end machine learning workflow with data provenance.

View on GitHub

Create a Join Pipeline

In this example, we will create a join pipeline. A join pipeline executes your code on files that match a specific naming pattern.

View on GitHub
View all