Chapter 4. Installing Kubeflow On-Premise
In this chapter we take a look at the basics for installing Kubeflow on an existing on-premise Kubernetes cluster. The assumption in this chapter is that you already have some background knowledge with Kubernetes and that you also have access to an existing Kubernetes cluster either on-premise or managed in the cloud. There are also options for learning environments such as Minikube, kubeadm-dind.
We also assume that you’re comfortable with software infrastructure install processes and can work from a command-line interface. If you need a quick refresher, in the next section we review some basic commands for Kubernetes.
Kubernetes Operations from the Command Line
Given that Kubeflow is tightly integrated with Kubernetes, we need to know a few Kubernetes core commands to perform any type of install. In this section we review the commands:
kubectl
docker
This chapter will give you the specifics around what parts of Kubernetes we need to worry about for an on-premise install. Let’s start out by getting some of our core command-line tools installed.
Installing kubectl
kubectl
controls the Kubernetes cluster manager and is a command-line interface for running commands against Kubernetes clusters. We use kubectl
to deploy and manage applications on Kubernetes. Using kubectl
, we can:
- Inspect cluster resources
- Create components
- Delete components
- Update components
For a more complete list of functions in kubectl
, check out this cheat sheet.
kubectl
is a fundamental tool for Kubernetes and Kubeflow operations, and we’ll use it a lot in the course of deploying components and running jobs on Kubeflow.
Installing kubectl on macOS
An easy way to install kubectl
on macOS is to use the brew
command:
brew install kubernetes-cli
Once we have kubectl
, we need permission for it to talk to our Kubernetes cluster.
Understanding kubectl and contexts
kubectl
knows how to talk to remote clusters based on a local context file stored on disk. We define a kubectl
context as an entry in a kubectl
file used to identify a group of access parameters under a common ID. Each of these groups of access parameters, or contexts, has three parameters:
- Cluster
- Namespace
- User
The default location for the local kubeconfig
file is ~/.kube/config (or $HOME/.kube/config). We can also set this location with the KUBECONFIG
environment variable or by setting the --kubeconfig
flag.
We can also use multiple configuration files, but for now, we’ll consider the case of the default configuration file. In some cases, you will be working with multiple clusters, and their context information will also be stored in this file.
To view the current kubectl
config, use the command:
kubectl config view
The output should look something like this:
apiVersion: v1 clusters: - cluster: certificate-authority: /home/ec2-user/.minikube/ca.crt server: https://172.17.0.3:8443 name: kubeflow contexts: - context: cluster: kubeflow user: kubeflow name: kubeflow current-context: kubeflow kind: Config preferences: {} users: - name: kubeflow user: client-certificate: /home/ec2-user/.minikube/profiles/kubeflow/client.crt client-key: /home/ec2-user/.minikube/profiles/kubeflow/client.key
The output can vary depending on how many contexts are currently in your local file, but it tells us things like what clusters we have attached and what the configuration is for the current context. For more information on operations we can perform on the context system for kubectl
, check out the online resource.
We use kubectl
context files to organize information about:
- Clusters
- Users
- Namespaces
- Authentication mechanisms
Let’s now look at a few specific ways to use the context file and kubectl
.
Getting the current context
If we want to know what the current context is, we would use the command:
kubectl config current-context
The output should look similar to the following console log output:
kubeflow
This gives us the ID of the context group in our context file; kubectl
currently will send Kubernetes commands to the cluster represented by that ID.
Adding clusters to our context file
To add a new Kubernetes cluster to our local context file we use the set-cluster
, set-credentials
, and set-context
commands, as seen in the following example:
kubectl config \
set-cluster NAME \
[--server=server] \
[--certificate-authority=path/to/certificate/authority] \
[--insecure-skip-tls-verify=true]
kubectl config \
set-credentials NAME \
[--client-certificate=path/to/certfile] \
[--client-key=path/to/keyfile] \
[--token=bearer_token] \
[--username=basic_user] \
[--password=basic_password] \
[--auth-provider=provider_name] \
[--auth-provider-arg=key=value] \
[--exec-command=exec_command] \
[--exec-api-version=exec_api_version] \
[--exec-arg=arg][--exec-env=key=value] \
[options]
kubectl config \
set-context [NAME | --current] \
[--cluster=cluster_nickname] \
[--user=user_nickname] \
[--namespace=namespace] \
[options]
Note that in the set-context
command, the NAME
parameter is the name of the credential set in using the set-credentials
command.
In the next chapter we’ll look at how to pull credentials for a public cloud and automatically add the Kubernetes context to our local context file.
Using kubectl
Let’s get used to using kubectl
by trying a few commands to get information from the cluster, such as the following:
- The current running services
- The cluster information
- The current running jobs
Getting running services
To confirm our cluster is operational and the components are running, try the following command:
kubectl -n kubeflow get services
We should see a list of components running that match the components we just installed on our cluster (see Example 4-1).
Example 4-1. List of services from the command line
NAME TYPE PORT(S) AGE admission-webhook-service ClusterIP 443/TCP 2d6h application-controller-service ClusterIP 443/TCP 2d6h argo-ui NodePort 80:30643/TCP 2d6h centraldashboard ClusterIP 80/TCP 2d6h jupyter-web-app-service ClusterIP 80/TCP 2d6h katib-controller ClusterIP 443/TCP,8080/TCP 2d6h katib-db-manager ClusterIP 6789/TCP 2d6h katib-mysql ClusterIP 3306/TCP 2d6h katib-ui ClusterIP 80/TCP 2d6h kfserving-controller-manager-metrics-service ClusterIP 8443/TCP 2d6h kfserving-controller-manager-service ClusterIP 443/TCP 2d6h kfserving-webhook-server-service ClusterIP 443/TCP 2d6h metadata-db ClusterIP 3306/TCP 2d6h metadata-envoy-service ClusterIP 9090/TCP 2d6h metadata-grpc-service ClusterIP 8080/TCP 2d6h metadata-service ClusterIP 8080/TCP 2d6h metadata-ui ClusterIP 80/TCP 2d6h minio-service ClusterIP 9000/TCP 2d6h ml-pipeline ClusterIP 8888/TCP,8887/TCP 2d6h ml-pipeline-ml-pipeline-visualizationserver ClusterIP 8888/TCP 2d6h ml-pipeline-tensorboard-ui ClusterIP 80/TCP 2d6h ml-pipeline-ui ClusterIP 80/TCP 2d6h mysql ClusterIP 3306/TCP 2d6h notebook-controller-service ClusterIP 443/TCP 2d6h profiles-kfam ClusterIP 8081/TCP 2d6h pytorch-operator ClusterIP 8443/TCP 2d6h seldon-webhook-service ClusterIP 443/TCP 2d6h tensorboard ClusterIP 9000/TCP 2d6h tf-job-operator ClusterIP 8443/TCP 2d6h
This lets us confirm which services that we’ve deployed are currently running.
Get cluster information
We can check out the status of the running cluster with the command:
kubectl cluster-info
We should see output similar to Example 4-2.
Example 4-2. kubectl cluster-info output
Kubernetes master is running at https://172.17.0.3:8443
KubeDNS is running at https://172.17.0.3:8443/api/v1/namespaces/kube-system...
To further debug and diagnose cluster problems, use kubectl cluster-info dump
.
Get currently running jobs
Typically we’d run a job based on a YAML file with the kubectl
command:
kubectl apply -f https://github.com/pattersonconsulting/tf_mnist_kubflow_3_5...
We should now have the job running on the Kubeflow cluster. We won’t see the job running and writing to our console screen because it is running on a remote cluster. We can check the job status with the command:
kubectl -n kubeflow get pod
Our console output should look something like Example 4-3.
Example 4-3. kubectl output for currently running jobs
NAME READY STATUS RESTARTS AGE admission-webhook-deployment-f9789b796-95rfz 1/1 Running 0 2d6h application-controller-stateful-set-0 1/1 Running 0 2d6h argo-ui-59f8d49b9-52kn8 1/1 Running 0 2d6h centraldashboard-6c548fc6dc-pzskh 1/1 Running 0 2d6h jupyter-web-app-deployment-657bf476db-v2xgl 1/1 Running 0 2d6h katib-controller-5c976769d8-fcxng 1/1 Running 1 2d6h katib-db-manager-bf77df6d6-dgml5 1/1 Running 0 2d6h katib-mysql-7db488768f-cgcnj 1/1 Running 0 2d6h katib-ui-6d7fbfffcb-t84xl 1/1 Running 0 2d6h kfserving-controller-manager-0 2/2 Running 1 2d6h metadata-db-5d56786648-ldlzq 1/1 Running 0 2d6h metadata-deployment-5c7df888b9-gdm5n 1/1 Running 0 2d6h metadata-envoy-deployment-7cc78946c9-kcmt4 1/1 Running 0 2d6h metadata-grpc-deployment-5c8545f76f-7q47f 1/1 Running 0 2d6h metadata-ui-665dff6f55-pbvdp 1/1 Running 0 2d6h minio-657c66cd9-mgxcd 1/1 Running 0 2d6h ml-pipeline-669cdb6bdf-vwglc 1/1 Running 0 2d6h ml-pipeline-ml-pipeline-visualizationserver... 1/1 Running 0 2d6h ml-pipeline-persistenceagent-56467f8856-zllpd 1/1 Running 0 2d6h ml-pipeline-scheduledworkflow-548b96d5fc-xkxdn 1/1 Running 0 2d6h ml-pipeline-ui-6bd4778958-bdf2x 1/1 Running 0 2d6h ml-pipeline-viewer-controller-deployment... 1/1 Running 0 2d6h mysql-8558d86476-xq2js 1/1 Running 0 2d6h notebook-controller-deployment-64b85fbc84... 1/1 Running 0 2d6h profiles-deployment-647448c7dd-9gnz4 2/2 Running 0 2d6h pytorch-operator-6bc9c99c5-gn7wm 1/1 Running 30 2d6h seldon-controller-manager-786775d4d9-frq9l 1/1 Running 0 2d6h spark-operatorcrd-cleanup-xq8zb 0/2 Completed 0 2d6h spark-operatorsparkoperator-9c559c997-mplrh 1/1 Running 0 2d6h spartakus-volunteer-5978bf56f-jftnh 1/1 Running 0 2d6h tensorboard-9b4c44f45-frr76 0/1 Pending 0 2d6h tf-job-operator-5d7cc587c5-tvxqk 1/1 Running 33 2d6h workflow-controller-59ff5f7874-8w9kd 1/1 Running 0 2d6h
Given that a TensorFlow job is run as an extension of the TensorFlow operator, it shows up as a pod alongside the other Kubeflow components.
Using Docker
Docker is the most common container system used in container orchestration systems such as Kubernetes. To launch a container we run an image. An image includes everything needed to run an application (code, runtime, libraries, etc.) as an executable image. In our TensorFlow job’s cases, it includes things like the TensorFlow library dependencies and our Python training code to run on each container.
Docker Hub provides a repository for container images to be stored, searched, and retrieved. Other repositories include Google’s Container Registry and on-premise Artifactory installs.
Basic Docker install
For information on how to install Docker, check out their documentation page for the process.
For the remainder of this chapter we assume that you have Docker installed. Let’s now move on to some basic Docker commands you’ll need to know.
Basic Docker commands
For details on using the build
command, see the Docker documentation page.
The command that follows builds the image from the dockerfile
contained in the local directory and gives it the tag [account]/[repository]:[tag]
:
docker build -t "[account]/[repository]:[tag]" .
To push the container we built in this Docker command, we’ll use a command of the following form:
docker push [account]/[repository]:[tag]
The following command takes the container image we built in the previous step and pushes it to the mike
account in Artifactory under the Kubeflow repo. It also adds the tag dist_tf_estimator
.
docker push mike/kubeflow:dist_tf_estimator
Now let’s move on to building TensorFlow containers with Docker.
Using Docker to build TensorFlow containers
When building Docker container images based on existing TensorFlow container images, be cognizant of:
- The desired TensorFlow version
- Whether the container image will be Python2- or Python3-based
- Whether the container image will have CPU or GPU bindings
We’re assuming here that you’ll either build your own base existing TensorFlow container or pull an existing one from gcr.io
or Docker Hub. Check out the TensorFlow repository at Docker Hub for some great examples of existing TensorFlow container images.
Containers, GPUs, and Python Version
Check out each container repository for its naming rules around Python 2 versus Python 3, as it can be different per repository. For GPU bindings within the container image, be sure to use the correct base image with the -gpu
tag.
Now let’s move on to the install process for Kubeflow from the command line.
Basic Install Process
The basic install process for Kubeflow is:
- Initialize Kubeflow artifacts.
- Customize any artifacts.
- Deploy Kubeflow artifacts to the cluster.
We break down each of these in the following sections.
Installing On-Premise
To install Kubeflow on-premise, we need to consider the following topics:
- Considerations for building Kubernetes clusters
- Gateway host access to the cluster
- Active Directory integration and user management
- Kerberos integration
- Learning versus production environments
- Storage integration
We start off by looking at variations of ways to set up Kubernetes clusters.
Considerations for Building Kubernetes Clusters
To frame our discussion on how we want to set up our Kubeflow installation on-premise, we’ll revisit the diagram for how clusters are broken up into logical layers (Figure 4-1).
Kubeflow lives in the application layer for our cluster, and we’ll install it as a set of long-lived pods and services.
Kubernetes Glossary
Kubernetes has a lot of terms and concepts to know. If you ever get confused, just check out the Kubernetes standardized glossary in the documentation on the Kubernetes project website.
Given this context, we understand that we need to install Kubeflow on an existing Kubernetes cluster. The location of things such as the control plane and the cluster infrastructure may greatly impact install design decisions, such as:
- Networking topologies
- Active Directory integration
- Kerberos integration
Let’s look further at what goes into setting up a gateway host to access our cluster.
Gateway Host Access to Kubernetes Cluster
In most shared multitenant enterprise systems, we have a gateway host that is used for access to the cluster. For the purposes of installing Kubeflow on a Kubernetes system, your cluster will likely need the same pattern setup.
Typically, the gateway host machine needs the following resources:
- Network access to the Kubernetes cluster
kubectl
installed and configured locally
Network access to the Kubernetes cluster where Kubeflow resides is required as we need access for kubectl
to send commands across the network. There are variations where container building is done on a machine that is not the gateway host, but it typically is a function of how your IT department sets things up.
It is perfectly fine for a gateway host to be a local machine that meets these requirements.
Active Directory Integration and User Management
In most organizations, users are managed by an Active Directory installation. Most enterprise software systems will need to integrate with this Active Directory installation to allow users to use systems such as Kubernetes or Kubeflow.
Let’s start off by looking at the typical user experience in an organization for accessing a Kubernetes cluster integrated with Active Directory.
Kubernetes, kubectl, and Active Directory
To access a Kubernetes cluster, users typically will have to formally request access to the cluster from their enterprise IT team. After an approval process has been successfully cleared, users will be added to the appropriate Active Directory group for access to the cluster.
Users access the gateway host (which again, can be their local machine) using a set of credentials, and immediately after logging in with the generic credentials, will be granted a Kerberos ticket. That ticket can later be used to authenticate to the Kubernetes cluster.
The necessary binaries (kubectl
, and plug-ins—as we mention in the following text), as well as the required kubeconfig
(Kubernetes configuration) will need to be configured by users. Once the kubeconfig
has been configured, users only need to concern themselves with executing the appropriate kubectl
commands.
Kerberos Integration
Enterprise IT teams commonly use Kerberos for a network authentication protocol as it is designed to be used by client/server applications (such as Kubernetes nodes) to provide strong authentication using secret-key cryptography. As described on the Kerberos website:
Kerberos was created by MIT as a solution to these network security problems. The Kerberos protocol uses strong cryptography so that a client can prove its identity to a server (and vice versa) across an insecure network connection. After a client and server has used Kerberos to prove their identity, they can also encrypt all of their communications to assure privacy and data integrity as they go about their business.
By default, Kubernetes does not provide a method to integrate with Kerberos directly, because it relies on a more modern approach—OpenID Connect, or OIDC for short.
One method of using Kerberos with Kubernetes is to exchange an existing Kerberos ticket for an OIDC token, and to present that token to Kubernetes upon authentication. This works by first authenticating to an OIDC token provider using an existing Kerberos credential, and obtaining an OIDC token in exchange for the Kerberos authentication. This can be accomplished by using a kubectl
plug-in, with an example here.
Storage Integration
Out-of-the-box Kubeflow does not have a notion of a specific datastore and lets the underlying Kubernetes cluster define what storage options are available.
When setting up our Kubeflow installation and running jobs, we need to consider:
- What kind of storage our cluster has available
- Data access patterns that are best suited for our job
- Security considerations around the data store
How we store data is intricately linked to how we access data, so we want to make sure that we think about how we’re going to access the data as we design our storage. The job types we’ll consider with regard to our data access patterns are:
- Python (or other) code in a single container run on Kubernetes
- A container is run on a specific Kubeflow operator (e.g., TFOperator, or PyTorchOperator) in normal single node execution or in distributed mode
- Python code run from a Jupyter Notebook
There are two facets to consider across these three job variants:
- Are we providing enough bandwidth to the job such that we’re not starving the modeling power of the code that is running?
- Are we integrating with the storage layer via filesystem semantics or via network calls?
Let’s start off by looking at the job bandwidth and storage for Kubeflow jobs.
Thinking about Kubeflow job bandwidth
If you’ll recall, in Chapter 2 we talked about how GPUs can affect jobs, from single GPUs to multi-GPUs and even distributed GPUs. GPUs can be hungry for data, so having an extremely fast storage subsystem is critical. We don’t want the GPUs waiting on data.
If a given job is going to be training on a lot of data, we can think about that job requiring a high-bandwidth storage solution that can satiate the needs of the GPUs. On the other hand, if a job was heavily computation bound, without a smaller dataset, the speed at which the initial data is received by the GPUs may not be that important. In the latter scenario, we can think of that as a lower-bandwidth job.
Common access storage patterns with Kubeflow jobs
There are two major ways Kubeflow jobs access storage:
- Using network calls across the network or internet
- Using filesystem semantics
If the job is going to pull data across the network/internet with the user’s own credentials (that we don’t mind putting in the configuration/code somewhere), then we don’t have to worry about filesystem semantics and mount points for Kubernetes at the Kubernetes level.
In this case, the code handles all network calls to get the data locally, but our cluster’s hosts need external network connectivity. Such examples might be accessing storage from S3, SFTP, third-party systems, etc.
If you want to use a local mount point to access a partition of the data (e.g., in a manner similar to how a local filesystem will be used by Python and notebooks on a local laptop), then you will need to provision storage using persistent volume claims (PVCs) at the Kubernetes pod and container level.
Options for Kubeflow storage
Kubernetes itself provides a plethora of storage mechanisms, more of which can be found here. At the most basic level, storage can be thought of as being locally attached storage on a particular Kubernetes worker node (e.g., a locally attached and ephemeral volume), or a layer of persistent storage, typically provided by a storage subsystem.
In the context of Kubeflow, a high-speed storage subsystem is preferred, such as a fiber-connected storage array. This provides a consistent high-bandwidth storage medium that can satiate the GPU needs.
Several examples of such high-bandwidth systems include:
- NetApp AFFA800
- Cisco FlexPod
- FlashBlade
In Chapters 5 through 7 we provide further details for each of the core storage systems for managed Kubernetes for the public clouds.
Persistent volume claims and Kubeflow storage
Kubeflow, by default, will use persistent volumes (PVs) and persistent volume claims (PVCs) for its storage needs. As an example, when a user deploys a notebook server, they will be given the option of dynamically allocating storage (out of a storage class), or to use an existing persistent volume claim.
The key distinction to understand between PVs and PVCs is that a PV is simply a representation of storage “somewhere,” such as an allocated “1 GB of space.” To actually utilize that storage space, a claim must be made against that storage. Once a claim is made, Kubernetes provides certain guarantees that for the lifespan of the claim, the underlying storage will not be released. Hence, in the context of Kubernetes, it’s not enough to simply have a PV, but a PVC must be acquired against that storage as well.
If a user dynamically provisions storage, Kubeflow will automatically create a PVC against the newly allocated storage, which can later be used and reused for various pods, notebooks, etc. If a user would like to provide an existing PVC when setting up a Kubeflow environment, such as Notebook, it is the PVC that is provided to Kubeflow (and not the PV itself).
Container Management and Artifact Repositories
Container management is key to Kubeflow and Kubernetes (or any container orchestration system) because we have to have some place for container images to live.
We should be clear here how container images differ from configuration files (e.g., Dockerfiles) that define container images. We can push our configuration files (Dockerfiles) to a source control repository such as github.com, but we need a different repository to manage application binary artifacts (e.g., container images).
Specifically, we need a place to store and manage all of our container application images for later deployment to Kubernetes.
There are two types of artifact repositories for container images:
- Public container image repositories (or registries)
- Private (and perhaps on-premise) container image repositories (or registries)
Public repositories/registries are typically accessed across the internet and allow everyone to see your containers (at least at the free tier). The most popular public artifact repository is hub.docker.com, also known as Docker Hub.
Private repositories/registrie can also be hosted on the internet, or be hosted on-premise. The details and implementation of creating and managing private repositories and registries are specific to each implementation.
The important key for Kubeflow is to understand that all container images must be pulled from a container repository somewhere. By default, Kubeflow is pre-configured to pull all container images from the Google Container Registry (gcr.io
). Kubeflow provides a mechanism for setting the location of the container registry.
Setting up an internal container repository
JFrog Artifactory OSS is an open source option for an on-premise container application registry (there are also commercial upgrades over the open source version).
To download Artifactory (or get a Docker image), check out their website. To install Artifactory on-premise, see their Confluence documentation. Artifactory includes support for:
- Solaris
- MacOS
- Windows
- Linux
Artifactory dependencies include a local database (default is an embedded Derby database), a filestore (local FS is the default), and integration with an HTTP server.
Accessing and Interacting with Kubeflow
There are two major ways to work with Kubeflow:
- The CLI, primarily using the
kubectl
tool, as well as thekfctl
tool - With a web browser, using the Kubeflow web UI
We cover the details of each in the next subsections.
It’s important to keep in mind that the Kubeflow management operations—such as deploying a Kubeflow installation, upgrading components of Kubeflow, etc.—are done using the kfctl
tool, while seeing what the cluster is currently “doing” is done via the kubectl
tool.
Common Command-Line Operations
kubectl
is the fundamental tool we are interested in for command line options on Kubeflow. In a previous section in this chapter we reviewed some of the key things we can do on a Kubernetes-based cluster with kubectl
. The relevant operations on Kubeflow we are interested in with kubectl
are:
- Running a basic container with some code, typically Python, on our cluster
- Running a group of containers on a special Kubernetes operator such as TFJob
For the first case, many times our practitioners have some Python code they’d like to run on GPUs. In these cases we create a container with the appropriate dependencies and run it on our Kubernetes cluster with Kubeflow.
In the second case, we need to set up our job YAML file to specify a targeted Kubernetes customer operator, such as TFJob, so we can leverage special container coordination modes such as TensorFlow distributed training.
Accessible Web UIs
The key web resource Kubeflow provides is the Kubeflow Dashboard UI that has links to all the other web-accessible resources Kubeflow provides. In Figure 4-2, we can see what the dashboard looks like.
As discussed in Chapter 1, this dashboard is effectively a quick table for the other relevant resources available via a web browser for Kubeflow users.
Installing Kubeflow
In this section, we will discuss the steps required to install Kubeflow.
System Requirements
As of the time of writing, the Kubernetes cluster must meet the following minimum requirements:
- 4 CPUs
- 50 GB storage
- 12 GB memory
The recommended Kubernetes version is 1.14. Kubeflow has been validated and tested on Kubernetes 1.14. Your cluster must run at least Kubernetes version 1.11, and Kubeflow does not work on Kubernetes 1.16.
Set Up and Deploy
Installing Kubeflow requires these steps:
- Download the
kfctl
tool. - Prepare the Kubeflow artifacts.
- Deploy the artifacts to a cluster.
Using a compatible system (such as Linux, or macOS), you acquire the kfctl
tool by downloading it from the Kubeflow releases pages on GitHub. See Example 4-4.
Example 4-4. Download and unpack the kfctl binary1
$ cd ~/bin $ curl -LOJ https://github.com/.../kfctl_v1.0.2-0-ga476281_linux.tar.gz $ tar zxvf kfctl_v1.0.2-0-ga476281_linux.tar.gz
Once the tool has been downloaded, a working directory is created which will hold the artifacts and any customizations done for Kubeflow. In Example 4-5, we will use the kf
directory in the user’s home directory (~/kf
).
Example 4-5. Create working directory
$ mkdir ~/kf $ cd ~/kf
We are not ready to prepare the Kubeflow installation. This is done by specifying an initial manifest to download and prepare from, as in Example 4-6.
Example 4-6. Prepare Kubeflow installation2
$ cd ~/kf $ ~/bin/kfctl build -V -f "https://raw.githubusercontent.com/..."
This will create a kustomize
directory, which will hold all the templates Kubeflow will deploy. At this stage, any additional customizations can be done.
As an example, to set a custom container registry to use, we can use the kfctl
tool. The command in Example 4-7 will change the default container registry from gcr.io
to hub.docker.com
.
Example 4-7. Set a custom container registry
$ ~/bin/kfctl alpha set-image-name hub.docker.com
Once we’re ready, Kubeflow can be deployed using the kfctl apply
command as seen in Example 4-8.
Example 4-8. Deploy Kubeflow
$ ~/bin/kfctl apply -V -f kfctl_istio_dex.v1.0.2.yaml
Summary
In this chapter we looked at the practical steps to deploying Kubeflow on-premise. While many users will want to jump to the cloud (as we will in the next chapters), on-premise installations are still relevant for many enterprise situations. As we move into the next chapter, we’ll see how we begin to build on many of the concepts introduced in this chapter while evolving the install for a cloud deployment.
1 The URL here has been shortened for space reasons; the full URL is https://github.com/kubeflow/kfctl/releases/download/v1.0.2/kfctl_v1.0.2-0-ga476281_linux.tar.gz.
2 The URL here has been shortened for space reasons; the full URL is https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_istio_dex.v1.0.2.yaml.
Get Kubeflow Operations Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.