Chapter 4. Installing Kubeflow On-Premise

In this chapter we take a look at the basics for installing Kubeflow on an existing on-premise Kubernetes cluster. The assumption in this chapter is that you already have some background knowledge with Kubernetes and that you also have access to an existing Kubernetes cluster either on-premise or managed in the cloud. There are also options for learning environments such as Minikube, kubeadm-dind.

We also assume that you’re comfortable with software infrastructure install processes and can work from a command-line interface. If you need a quick refresher, in the next section we review some basic commands for Kubernetes.

Kubernetes Operations from the Command Line

Given that Kubeflow is tightly integrated with Kubernetes, we need to know a few Kubernetes core commands to perform any type of install. In this section we review the commands:

kubectl
docker

This chapter will give you the specifics around what parts of Kubernetes we need to worry about for an on-premise install. Let’s start out by getting some of our core command-line tools installed.

Installing kubectl

kubectl controls the Kubernetes cluster manager and is a command-line interface for running commands against Kubernetes clusters. We use kubectl to deploy and manage applications on Kubernetes. Using kubectl, we can:

Inspect cluster resources
Create components
Delete components
Update components

For a more complete list of functions in kubectl, check out this cheat sheet.

kubectl is a fundamental tool for Kubernetes and Kubeflow operations, and we’ll use it a lot in the course of deploying components and running jobs on Kubeflow.

Installing kubectl on macOS

An easy way to install kubectl on macOS is to use the brew command:

brew install kubernetes-cli

Once we have kubectl, we need permission for it to talk to our Kubernetes cluster.

Understanding kubectl and contexts

kubectl knows how to talk to remote clusters based on a local context file stored on disk. We define a kubectl context as an entry in a kubectl file used to identify a group of access parameters under a common ID. Each of these groups of access parameters, or contexts, has three parameters :

Cluster
Namespace
User

The default location for the local kubeconfig file is ~/.kube/config (or $HOME/.kube/config). We can also set this location with the KUBECONFIG environment variable or by setting the --kubeconfig flag.

We can also use multiple configuration files, but for now, we’ll consider the case of the default configuration file. In some cases, you will be working with multiple clusters, and their context information will also be stored in this file.

To view the current kubectl config, use the command:

kubectl config view

The output should look something like this:

apiVersion: v1
clusters:
- cluster:
    certificate-authority: /home/ec2-user/.minikube/ca.crt
    server: https://172.17.0.3:8443
  name: kubeflow
contexts:
- context:
    cluster: kubeflow
    user: kubeflow
  name: kubeflow
current-context: kubeflow
kind: Config
preferences: {}
users:
- name: kubeflow
  user:
    client-certificate: /home/ec2-user/.minikube/profiles/kubeflow/client.crt
    client-key: /home/ec2-user/.minikube/profiles/kubeflow/client.key

The output can vary depending on how many contexts are currently in your local file, but it tells us things like what clusters we have attached and what the configuration is for the current context. For more information on operations we can perform on the context system for kubectl, check out the online resource.

We use kubectl context files to organize information about:

Clusters
Users
Namespaces
Authentication mechanisms

Let’s now look at a few specific ways to use the context file and kubectl.

Getting the current context

If we want to know what the current context is, we would use the command:

kubectl config current-context

The output should look similar to the following console log output:

kubeflow

This gives us the ID of the context group in our context file; kubectl currently will send Kubernetes commands to the cluster represented by that ID.

Adding clusters to our context file

To add a new Kubernetes cluster to our local context file we use the set-cluster, set-credentials, and set-context commands, as seen in the following example:

kubectl config \ set-cluster NAME \ [--server=server] \ [--certificate-authority=path/to/certificate/authority] \ [--insecure-skip-tls-verify=true]

kubectl config \ set-credentials NAME \ [--client-certificate=path/to/certfile] \ [--client-key=path/to/keyfile] \ [--token=bearer_token] \ [--username=basic_user] \ [--password=basic_password] \ [--auth-provider=provider_name] \ [--auth-provider-arg=key=value] \ [--exec-command=exec_command] \ [--exec-api-version=exec_api_version] \ [--exec-arg=arg][--exec-env=key=value] \ [options]

kubectl config \ set-context [NAME | --current] \ [--cluster=cluster_nickname] \ [--user=user_nickname] \ [--namespace=namespace] \ [options]

Note that in the set-context command, the NAME parameter is the name of the credential set in using the set-credentials command.

In the next chapter we’ll look at how to pull credentials for a public cloud and automatically add the Kubernetes context to our local context file.

Switching contexts

To change the default context to point to another Kubernetes cluster, use the command:

kubectl config use-context [my-cluster-name]

All commands issued via kubectl now should be routed to the cluster we previously added with the ID [my-cluster-name].

Using kubectl

Let’s get used to using kubectl by trying a few commands to get information from the cluster, such as the following:

The current running services
The cluster information
The current running jobs

Getting running services

To confirm our cluster is operational and the components are running, try the following command:

kubectl -n kubeflow get services

We should see a list of components running that match the components we just installed on our cluster (see Example 4-1).

Example 4-1. List of services from the command line

NAME                                          TYPE        PORT(S)             AGE
admission-webhook-service                     ClusterIP   443/TCP             2d6h
application-controller-service                ClusterIP   443/TCP             2d6h
argo-ui                                       NodePort    80:30643/TCP        2d6h
centraldashboard                              ClusterIP   80/TCP              2d6h
jupyter-web-app-service                       ClusterIP   80/TCP              2d6h
katib-controller                              ClusterIP   443/TCP,8080/TCP    2d6h
katib-db-manager                              ClusterIP   6789/TCP            2d6h
katib-mysql                                   ClusterIP   3306/TCP            2d6h
katib-ui                                      ClusterIP   80/TCP              2d6h
kfserving-controller-manager-metrics-service  ClusterIP   8443/TCP            2d6h
kfserving-controller-manager-service          ClusterIP   443/TCP             2d6h
kfserving-webhook-server-service              ClusterIP   443/TCP             2d6h
metadata-db                                   ClusterIP   3306/TCP            2d6h
metadata-envoy-service                        ClusterIP   9090/TCP            2d6h
metadata-grpc-service                         ClusterIP   8080/TCP            2d6h
metadata-service                              ClusterIP   8080/TCP            2d6h
metadata-ui                                   ClusterIP   80/TCP              2d6h
minio-service                                 ClusterIP   9000/TCP            2d6h
ml-pipeline                                   ClusterIP   8888/TCP,8887/TCP   2d6h
ml-pipeline-ml-pipeline-visualizationserver   ClusterIP   8888/TCP            2d6h
ml-pipeline-tensorboard-ui                    ClusterIP   80/TCP              2d6h
ml-pipeline-ui                                ClusterIP   80/TCP              2d6h
mysql                                         ClusterIP   3306/TCP            2d6h
notebook-controller-service                   ClusterIP   443/TCP             2d6h
profiles-kfam                                 ClusterIP   8081/TCP            2d6h
pytorch-operator                              ClusterIP   8443/TCP            2d6h
seldon-webhook-service                        ClusterIP   443/TCP             2d6h
tensorboard                                   ClusterIP   9000/TCP            2d6h
tf-job-operator                               ClusterIP   8443/TCP            2d6h

This lets us confirm which services that we’ve deployed are currently running.

Get cluster information

We can check out the status of the running cluster with the command:

kubectl cluster-info

We should see output similar to Example 4-2.

Example 4-2. kubectl cluster-info output

Kubernetes master is running at https://172.17.0.3:8443
KubeDNS is running at https://172.17.0.3:8443/api/v1/namespaces/kube-system...

To further debug and diagnose cluster problems, use kubectl cluster-info dump.

Get currently running jobs

Typically we’d run a job based on a YAML file with the kubectl command:

kubectl apply -f https://github.com/pattersonconsulting/tf_mnist_kubflow_3_5...

We should now have the job running on the Kubeflow cluster. We won’t see the job running and writing to our console screen because it is running on a remote cluster. We can check the job status with the command:

kubectl -n kubeflow get pod

Our console output should look something like Example 4-3.

Example 4-3. kubectl output for currently running jobs

NAME                                              READY  STATUS     RESTARTS  AGE
admission-webhook-deployment-f9789b796-95rfz      1/1    Running    0         2d6h
application-controller-stateful-set-0             1/1    Running    0         2d6h
argo-ui-59f8d49b9-52kn8                           1/1    Running    0         2d6h
centraldashboard-6c548fc6dc-pzskh                 1/1    Running    0         2d6h
jupyter-web-app-deployment-657bf476db-v2xgl       1/1    Running    0         2d6h
katib-controller-5c976769d8-fcxng                 1/1    Running    1         2d6h
katib-db-manager-bf77df6d6-dgml5                  1/1    Running    0         2d6h
katib-mysql-7db488768f-cgcnj                      1/1    Running    0         2d6h
katib-ui-6d7fbfffcb-t84xl                         1/1    Running    0         2d6h
kfserving-controller-manager-0                    2/2    Running    1         2d6h
metadata-db-5d56786648-ldlzq                      1/1    Running    0         2d6h
metadata-deployment-5c7df888b9-gdm5n              1/1    Running    0         2d6h
metadata-envoy-deployment-7cc78946c9-kcmt4        1/1    Running    0         2d6h
metadata-grpc-deployment-5c8545f76f-7q47f         1/1    Running    0         2d6h
metadata-ui-665dff6f55-pbvdp                      1/1    Running    0         2d6h
minio-657c66cd9-mgxcd                             1/1    Running    0         2d6h
ml-pipeline-669cdb6bdf-vwglc                      1/1    Running    0         2d6h
ml-pipeline-ml-pipeline-visualizationserver...    1/1    Running    0         2d6h
ml-pipeline-persistenceagent-56467f8856-zllpd     1/1    Running    0         2d6h
ml-pipeline-scheduledworkflow-548b96d5fc-xkxdn    1/1    Running    0         2d6h
ml-pipeline-ui-6bd4778958-bdf2x                   1/1    Running    0         2d6h
ml-pipeline-viewer-controller-deployment...       1/1    Running    0         2d6h
mysql-8558d86476-xq2js                            1/1    Running    0         2d6h
notebook-controller-deployment-64b85fbc84...      1/1    Running    0         2d6h
profiles-deployment-647448c7dd-9gnz4              2/2    Running    0         2d6h
pytorch-operator-6bc9c99c5-gn7wm                  1/1    Running    30        2d6h
seldon-controller-manager-786775d4d9-frq9l        1/1    Running    0         2d6h
spark-operatorcrd-cleanup-xq8zb                   0/2    Completed  0         2d6h
spark-operatorsparkoperator-9c559c997-mplrh       1/1    Running    0         2d6h
spartakus-volunteer-5978bf56f-jftnh               1/1    Running    0         2d6h
tensorboard-9b4c44f45-frr76                       0/1    Pending    0         2d6h
tf-job-operator-5d7cc587c5-tvxqk                  1/1    Running    33        2d6h
workflow-controller-59ff5f7874-8w9kd              1/1    Running    0         2d6h

Given that a TensorFlow job is run as an extension of the TensorFlow operator, it shows up as a pod alongside the other Kubeflow components.

Using Docker

Docker is the most common container system used in container orchestration systems such as Kubernetes. To launch a container we run an image. An image includes everything needed to run an application (code, runtime, libraries, etc.) as an executable image. In our TensorFlow job’s cases, it includes things like the TensorFlow library dependencies and our Python training code to run on each container.

Docker Hub provides a repository for container images to be stored, searched, and retrieved. Other repositories include Google’s Container Registry and on-premise Artifactory installs .

Basic Docker install

For information on how to install Docker, check out their documentation page for the process.

For the remainder of this chapter we assume that you have Docker installed. Let’s now move on to some basic Docker commands you’ll need to know.

Basic Docker commands

For details on using the build command, see the Docker documentation page.

The command that follows builds the image from the dockerfile contained in the local directory and gives it the tag [account]/[repository]:[tag]:

docker build -t "[account]/[repository]:[tag]" .

To push the container we built in this Docker command, we’ll use a command of the following form:

docker push [account]/[repository]:[tag]

The following command takes the container image we built in the previous step and pushes it to the mike account in Artifactory under the Kubeflow repo. It also adds the tag dist_tf_estimator.

docker push mike/kubeflow:dist_tf_estimator

Now let’s move on to building TensorFlow containers with Docker.

Using Docker to build TensorFlow containers

When building Docker container images based on existing TensorFlow container images, be cognizant of:

The desired TensorFlow version
Whether the container image will be Python2- or Python3-based
Whether the container image will have CPU or GPU bindings

We’re assuming here that you’ll either build your own base existing TensorFlow container or pull an existing one from gcr.io or Docker Hub. Check out the TensorFlow repository at Docker Hub for some great examples of existing TensorFlow container images.

Containers, GPUs, and Python Version

Check out each container repository for its naming rules around Python 2 versus Python 3, as it can be different per repository. For GPU bindings within the container image, be sure to use the correct base image with the -gpu tag.

Now let’s move on to the install process for Kubeflow from the command line .

Basic Install Process

The basic install process for Kubeflow is:

Initialize Kubeflow artifacts.
Customize any artifacts.
Deploy Kubeflow artifacts to the cluster.

We break down each of these in the following sections.

Installing On-Premise

To install Kubeflow on-premise, we need to consider the following topics:

Considerations for building Kubernetes clusters
Gateway host access to the cluster
Active Directory integration and user management
Kerberos integration
Learning versus production environments
Storage integration

We start off by looking at variations of ways to set up Kubernetes clusters.

Considerations for Building Kubernetes Clusters

To frame our discussion on how we want to set up our Kubeflow installation on-premise, we’ll revisit the diagram for how clusters are broken up into logical layers (Figure 4-1).

Kubeflow lives in the application layer for our cluster, and we’ll install it as a set of long-lived pods and services.

Kubernetes Glossary

Kubernetes has a lot of terms and concepts to know. If you ever get confused, just check out the Kubernetes standardized glossary in the documentation on the Kubernetes project website.

Given this context, we understand that we need to install Kubeflow on an existing Kubernetes cluster. The location of things such as the control plane and the cluster infrastructure may greatly impact install design decisions, such as:

Networking topologies
Active Directory integration
Kerberos integration

Let’s look further at what goes into setting up a gateway host to access our cluster.

Gateway Host Access to Kubernetes Cluster

In most shared multitenant enterprise systems, we have a gateway host that is used for access to the cluster. For the purposes of installing Kubeflow on a Kubernetes system, your cluster will likely need the same pattern setup.

Typically, the gateway host machine needs the following resources:

Network access to the Kubernetes cluster
kubectl installed and configured locally

Network access to the Kubernetes cluster where Kubeflow resides is required as we need access for kubectl to send commands across the network. There are variations where container building is done on a machine that is not the gateway host, but it typically is a function of how your IT department sets things up.

It is perfectly fine for a gateway host to be a local machine that meets these requirements.

Active Directory Integration and User Management

In most organizations, users are managed by an Active Directory installation. Most enterprise software systems will need to integrate with this Active Directory installation to allow users to use systems such as Kubernetes or Kubeflow.

Let’s start off by looking at the typical user experience in an organization for accessing a Kubernetes cluster integrated with Active Directory.

Kubernetes, kubectl, and Active Directory

To access a Kubernetes cluster, users typically will have to formally request access to the cluster from their enterprise IT team. After an approval process has been successfully cleared, users will be added to the appropriate Active Directory group for access to the cluster.

Users access the gateway host (which again, can be their local machine) using a set of credentials, and immediately after logging in with the generic credentials, will be granted a Kerberos ticket. That ticket can later be used to authenticate to the Kubernetes cluster.

The necessary binaries (kubectl, and plug-ins—as we mention in the following text), as well as the required kubeconfig (Kubernetes configuration) will need to be configured by users. Once the kubeconfig has been configured, users only need to concern themselves with executing the appropriate kubectl commands.

Kerberos Integration

Enterprise IT teams commonly use Kerberos for a network authentication protocol as it is designed to be used by client/server applications (such as Kubernetes nodes) to provide strong authentication using secret-key cryptography. As described on the Kerberos website:

Kerberos was created by MIT as a solution to these network security problems. The Kerberos protocol uses strong cryptography so that a client can prove its identity to a server (and vice versa) across an insecure network connection. After a client and server has used Kerberos to prove their identity, they can also encrypt all of their communications to assure privacy and data integrity as they go about their business.

By default, Kubernetes does not provide a method to integrate with Kerberos directly, because it relies on a more modern approach—OpenID Connect, or OIDC for short.

One method of using Kerberos with Kubernetes is to exchange an existing Kerberos ticket for an OIDC token, and to present that token to Kubernetes upon authentication. This works by first authenticating to an OIDC token provider using an existing Kerberos credential, and obtaining an OIDC token in exchange for the Kerberos authentication. This can be accomplished by using a kubectl plug-in, with an example here.

Storage Integration

Out-of-the-box Kubeflow does not have a notion of a specific datastore and lets the underlying Kubernetes cluster define what storage options are available.

When setting up our Kubeflow installation and running jobs, we need to consider:

What kind of storage our cluster has available
Data access patterns that are best suited for our job
Security considerations around the data store

How we store data is intricately linked to how we access data, so we want to make sure that we think about how we’re going to access the data as we design our storage. The job types we’ll consider with regard to our data access patterns are:

Python (or other) code in a single container run on Kubernetes
A container is run on a specific Kubeflow operator (e.g., TFOperator, or PyTorchOperator) in normal single node execution or in distributed mode
Python code run from a Jupyter Notebook

There are two facets to consider across these three job variants:

Are we providing enough bandwidth to the job such that we’re not starving the modeling power of the code that is running?
Are we integrating with the storage layer via filesystem semantics or via network calls?

Let’s start off by looking at the job bandwidth and storage for Kubeflow jobs.

Thinking about Kubeflow job bandwidth

If you’ll recall, in Chapter 2 we talked about how GPUs can affect jobs, from single GPUs to multi-GPUs and even distributed GPUs. GPUs can be hungry for data, so having an extremely fast storage subsystem is critical. We don’t want the GPUs waiting on data.

If a given job is going to be training on a lot of data, we can think about that job requiring a high-bandwidth storage solution that can satiate the needs of the GPUs. On the other hand, if a job was heavily computation bound, without a smaller dataset, the speed at which the initial data is received by the GPUs may not be that important. In the latter scenario, we can think of that as a lower-bandwidth job.

Common access storage patterns with Kubeflow jobs

There are two major ways Kubeflow jobs access storage:

Using network calls across the network or internet
Using filesystem semantics

If the job is going to pull data across the network/internet with the user’s own credentials (that we don’t mind putting in the configuration/code somewhere), then we don’t have to worry about filesystem semantics and mount points for Kubernetes at the Kubernetes level.

In this case, the code handles all network calls to get the data locally, but our cluster’s hosts need external network connectivity. Such examples might be accessing storage from S3, SFTP, third-party systems, etc.

If you want to use a local mount point to access a partition of the data (e.g., in a manner similar to how a local filesystem will be used by Python and notebooks on a local laptop), then you will need to provision storage using persistent volume claims (PVCs) at the Kubernetes pod and container level.

Options for Kubeflow storage

Kubernetes itself provides a plethora of storage mechanisms, more of which can be found here. At the most basic level, storage can be thought of as being locally attached storage on a particular Kubernetes worker node (e.g., a locally attached and ephemeral volume), or a layer of persistent storage, typically provided by a storage subsystem.

In the context of Kubeflow, a high-speed storage subsystem is preferred, such as a fiber-connected storage array. This provides a consistent high-bandwidth storage medium that can satiate the GPU needs.

Several examples of such high-bandwidth systems include :

NetApp AFFA800
Cisco FlexPod
FlashBlade

In Chapters 5 through 7 we provide further details for each of the core storage systems for managed Kubernetes for the public clouds.

Persistent volume claims and Kubeflow storage

Kubeflow, by default, will use persistent volumes (PVs) and persistent volume claims (PVCs) for its storage needs. As an example, when a user deploys a notebook server, they will be given the option of dynamically allocating storage (out of a storage class), or to use an existing persistent volume claim.

The key distinction to understand between PVs and PVCs is that a PV is simply a representation of storage “somewhere,” such as an allocated “1 GB of space.” To actually utilize that storage space, a claim must be made against that storage. Once a claim is made, Kubernetes provides certain guarantees that for the lifespan of the claim, the underlying storage will not be released. Hence, in the context of Kubernetes, it’s not enough to simply have a PV, but a PVC must be acquired against that storage as well.

If a user dynamically provisions storage, Kubeflow will automatically create a PVC against the newly allocated storage, which can later be used and reused for various pods, notebooks, etc. If a user would like to provide an existing PVC when setting up a Kubeflow environment, such as Notebook, it is the PVC that is provided to Kubeflow (and not the PV itself).

Container Management and Artifact Repositories

Container management is key to Kubeflow and Kubernetes (or any container orchestration system) because we have to have some place for container images to live.

We should be clear here how container images differ from configuration files (e.g., Dockerfiles) that define container images. We can push our configuration files (Dockerfiles) to a source control repository such as github.com, but we need a different repository to manage application binary artifacts (e.g., container images).

Specifically, we need a place to store and manage all of our container application images for later deployment to Kubernetes.

There are two types of artifact repositories for container images:

Public container image repositories (or registries)
Private (and perhaps on-premise) container image repositories (or registries)

Public repositories/registries are typically accessed across the internet and allow everyone to see your containers (at least at the free tier). The most popular public artifact repository is hub.docker.com, also known as Docker Hub.

Private repositories/registrie can also be hosted on the internet, or be hosted on-premise. The details and implementation of creating and managing private repositories and registries are specific to each implementation.

The important key for Kubeflow is to understand that all container images must be pulled from a container repository somewhere. By default, Kubeflow is pre-configured to pull all container images from the Google Container Registry (gcr.io). Kubeflow provides a mechanism for setting the location of the container registry.

Setting up an internal container repository

JFrog Artifactory OSS is an open source option for an on-premise container application registry (there are also commercial upgrades over the open source version).

To download Artifactory (or get a Docker image), check out their website. To install Artifactory on-premise, see their Confluence documentation. Artifactory includes support for:

Solaris
MacOS
Windows
Linux

Artifactory dependencies include a local database (default is an embedded Derby database), a filestore (local FS is the default), and integration with an HTTP server.

Accessing and Interacting with Kubeflow

There are two major ways to work with Kubeflow:

The CLI, primarily using the kubectl tool, as well as the kfctl tool
With a web browser, using the Kubeflow web UI

We cover the details of each in the next subsections.

It’s important to keep in mind that the Kubeflow management operations—such as deploying a Kubeflow installation, upgrading components of Kubeflow, etc.—are done using the kfctl tool, while seeing what the cluster is currently “doing” is done via the kubectl tool.

Common Command-Line Operations

kubectl is the fundamental tool we are interested in for command line options on Kubeflow. In a previous section in this chapter we reviewed some of the key things we can do on a Kubernetes-based cluster with kubectl. The relevant operations on Kubeflow we are interested in with kubectl are:

Running a basic container with some code, typically Python, on our cluster
Running a group of containers on a special Kubernetes operator such as TFJob

For the first case, many times our practitioners have some Python code they’d like to run on GPUs. In these cases we create a container with the appropriate dependencies and run it on our Kubernetes cluster with Kubeflow.

In the second case, we need to set up our job YAML file to specify a targeted Kubernetes customer operator, such as TFJob, so we can leverage special container coordination modes such as TensorFlow distributed training.

Accessible Web UIs

The key web resource Kubeflow provides is the Kubeflow Dashboard UI that has links to all the other web-accessible resources Kubeflow provides. In Figure 4-2, we can see what the dashboard looks like.

As discussed in Chapter 1, this dashboard is effectively a quick table for the other relevant resources available via a web browser for Kubeflow users.

Installing Kubeflow

In this section, we will discuss the steps required to install Kubeflow.

System Requirements

As of the time of writing, the Kubernetes cluster must meet the following minimum requirements:

4 CPUs
50 GB storage
12 GB memory

The recommended Kubernetes version is 1.14. Kubeflow has been validated and tested on Kubernetes 1.14. Your cluster must run at least Kubernetes version 1.11, and Kubeflow does not work on Kubernetes 1.16.

Set Up and Deploy

Installing Kubeflow requires these steps:

Download the kfctl tool.
Prepare the Kubeflow artifacts.
Deploy the artifacts to a cluster.

Using a compatible system (such as Linux, or macOS), you acquire the kfctl tool by downloading it from the Kubeflow releases pages on GitHub. See Example 4-4.

Example 4-4. Download and unpack the kfctl binary¹

$ cd ~/bin
$ curl -LOJ https://github.com/.../kfctl_v1.0.2-0-ga476281_linux.tar.gz
$ tar zxvf kfctl_v1.0.2-0-ga476281_linux.tar.gz

Once the tool has been downloaded, a working directory is created which will hold the artifacts and any customizations done for Kubeflow. In Example 4-5, we will use the kf directory in the user’s home directory (~/kf).

Example 4-5. Create working directory

$ mkdir ~/kf
$ cd ~/kf

We are not ready to prepare the Kubeflow installation. This is done by specifying an initial manifest to download and prepare from, as in Example 4-6.

Example 4-6. Prepare Kubeflow installation²

$ cd ~/kf
$ ~/bin/kfctl build -V -f "https://raw.githubusercontent.com/..."

This will create a kustomize directory, which will hold all the templates Kubeflow will deploy. At this stage, any additional customizations can be done.

As an example, to set a custom container registry to use, we can use the kfctl tool. The command in Example 4-7 will change the default container registry from gcr.io to hub.docker.com.

Example 4-7. Set a custom container registry

$ ~/bin/kfctl alpha set-image-name hub.docker.com

Once we’re ready, Kubeflow can be deployed using the kfctl apply command as seen in Example 4-8.

Example 4-8. Deploy Kubeflow

$ ~/bin/kfctl apply -V -f kfctl_istio_dex.v1.0.2.yaml

Kubernetes Context

Keep in mind that Kubeflow will use the default Kubernetes context, and this will dictate to which Kubernetes cluster Kubeflow will be installed .

Summary

In this chapter we looked at the practical steps to deploying Kubeflow on-premise. While many users will want to jump to the cloud (as we will in the next chapters), on-premise installations are still relevant for many enterprise situations. As we move into the next chapter, we’ll see how we begin to build on many of the concepts introduced in this chapter while evolving the install for a cloud deployment.

¹ The URL here has been shortened for space reasons; the full URL is https://github.com/kubeflow/kfctl/releases/download/v1.0.2/kfctl_v1.0.2-0-ga476281_linux.tar.gz.

² The URL here has been shortened for space reasons; the full URL is https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_istio_dex.v1.0.2.yaml.

Get Kubeflow Operations Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial