High-Performance TensorFlow Serving with Kubernetes and NVIDIA GPUs

Question

Pulumi · Accepted Answer

To create a high-performance TensorFlow serving setup with Kubernetes and NVIDIA GPUs, you would typically need to follow these steps:

1. Set up a Kubernetes cluster with nodes that have NVIDIA GPUs attached.
2. Install the NVIDIA device plugin for Kubernetes on your cluster nodes.
3. Deploy a TensorFlow Serving container configured to access GPUs on the cluster.

Pulumi can help automate the provisioning of the required cloud infrastructure, such as the Kubernetes cluster, and the specific configuration needed to support GPUs.

### Creating a GPU-Enabled Kubernetes Cluster

Pulumi's `pulumi_kubernetes` package can be used to create a Kubernetes cluster on GCP, AWS, or Azure that supports NVIDIA GPUs. This typically involves selecting a specific instance type or VM size that includes GPUs, and configuring the appropriate GPU drivers and NVIDIA device plugin as part of the cluster setup.

Here is a program that demonstrates the creation of a GPU-enabled Kubernetes cluster on AWS using Pulumi's EKS module (`pulumi_eks`). This cluster will have nodes with NVIDIA GPUs, and it will use the `pulumi_eks` module to manage the cluster and the node group.

We'll start by setting up an EKS cluster with a managed node group that has GPU-enabled instances.

```python
import pulumi
import pulumi_eks as eks

# Create an EKS cluster with the appropriate configuration to support GPUs.
cluster = eks.Cluster("gpu-cluster", opts=pulumi.ResourceOptions(provider=eks.Provider()))

# Specify the instance type that includes NVIDIA GPUs (e.g., `p2.xlarge` or `p3.2xlarge`).
gpu_instance_type = "p3.2xlarge"  # This instance type includes NVIDIA Tesla V100 GPUs.

# Create a managed node group with GPU instances.
gpu_node_group = cluster.create_managed_node_group("gpu-node-group", instance_type=gpu_instance_type, desired_capacity=2)

# Export the cluster's kubeconfig.
pulumi.export('kubeconfig', cluster.kubeconfig)
```

This code creates a simple AWS EKS cluster and a node group with GPU instances. The `desired_capacity` parameter specifies the number of nodes you want in your node group. The `p3.2xlarge` instance type is chosen for this example as it includes NVIDIA Tesla V100 GPUs, which are ideal for machine learning workloads.

### Installing the NVIDIA Device Plugin

To allow Kubernetes to manage the GPU resources on your nodes, you need to install the NVIDIA device plugin. This plugin is responsible for advertising the NVIDIA GPU resources to the Kubernetes scheduler.

```python
import pulumi_kubernetes as k8s

# We assume you already have a 'kubeconfig' produced from your cluster setup.
kubeconfig = cluster.kubeconfig.apply(lambda kc: kc)

# Create a Kubernetes provider instance using the kubeconfig.
k8s_provider = k8s.Provider('k8s-provider', kubeconfig=kubeconfig)

# Installing the NVIDIA device plugin using a DaemonSet.
nvidia_device_plugin_yaml = """
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      containers:
      - image: "nvidia/k8s-device-plugin:1.0.0-beta"
        name: nvidia-device-plugin-ctr
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
"""

# Create a daemon set using the NVIDIA device plugin YAML.
nvidia_device_plugin = k8s.yaml.ConfigGroup(
    'nvidia-device-plugin',
    files=[nvidia_device_plugin_yaml],
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)
```

Here we used a default version of the NVIDIA device plugin (1.0.0-beta) for demonstration purposes. You should use the latest stable version compatible with your Kubernetes cluster version.

Once the NVIDIA device plugin is installed, your Kubernetes cluster should be able to schedule GPU-enabled TensorFlow Serving workloads.

### Deploying TensorFlow Serving with GPU Support

To use TensorFlow Serving with GPU support, you need to deploy it as a Kubernetes deployment with the appropriate configuration.

```python
# Assume this would be a suitable TensorFlow Serving image with GPU support.
tensorflow_serving_gpu_image = "tensorflow/serving:latest-gpu"

# Create a Kubernetes deployment for TensorFlow Serving.
tensorflow_serving_deployment = k8s.apps.v1.Deployment(
    "tensorflow-serving-deployment",
    spec={
        "selector": {"matchLabels": {"app": "tensorflow-serving"}},
        "replicas": 2,
        "template": {
            "metadata": {"labels": {"app": "tensorflow-serving"}},
            "spec": {
                "containers": [
                    {
                        "name": "tensorflow-serving",
                        "image": tensorflow_serving_gpu_image,
                        # Specify the GPU resource requirement.
                        "resources": {
                            "limits": {"nvidia.com/gpu": 1}
                        },
                        # Define other container specs like ports, env vars, etc.
                    }
                ]
            },
        },
    },
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Export the Tensorflow Serving deployment name.
pulumi.export('tensorflow_serving_deployment_name', tensorflow_serving_deployment.metadata["name"])
```

This TensorFlow Serving deployment requests one GPU (`"nvidia.com/gpu": 1`) for each replica of the deployment, assuming that's the amount needed for the workload. You should adjust the container spec based on the specific TensorFlow Serving configuration you need.

This Pulumi program provides the infrastructure and configuration to deploy a high-performance TensorFlow Serving setup with Kubernetes and NVIDIA GPUs. The use of specific images and versions may vary based on your TensorFlow, CUDA, and Kubernetes versions, so ensure you're using compatible versions for your environment.