Resilience Testing of AI Applications on Kubernetes

Question

Pulumi · Accepted Answer

Resilience testing of AI applications on Kubernetes typically involves deliberately creating disruptive scenarios to observe how the system responds and recovers. However, setting up the infrastructure for such a testing environment requires several steps. You'll first need a Kubernetes cluster where your AI applications can be deployed. Once you have that, you can use various Kubernetes resources that help manage the workload and ensure availability even during disruptions.

To set up a Kubernetes environment optimized for resilience testing, we're going to perform the following:

1. Provision a managed Kubernetes cluster where you can deploy your AI applications. Managed Kubernetes services like Amazon EKS, Google Kubernetes Engine (GKE), or Azure Kubernetes Service (AKS) simplify the process of creating and maintaining a Kubernetes cluster.

2. Introduce resources like `PodDisruptionBudgets` which help ensure that a certain minimum number of pods remain available during voluntary disruptions.

3. Use `PriorityLevelConfiguration` and `LimitRange` resources for ensuring control over resource allocation and prioritizing critical workloads, which is vital for testing the resilience of your applications systematically.

Let's create a simple program in Pulumi using Python that sets up a managed Kubernetes cluster using the Google Kubernetes Engine (GKE), as it provides a robust, production-ready environment.

Next, for resilience testing, we will add a `PodDisruptionBudget`, which limits the number of Pods of a replicated application that are down simultaneously from voluntary disruptions. Also, we will use `PriorityLevelConfiguration` resources to ensure that the system makes distinctions between different types of workloads and provides a level of Quality of Service (QoS).

Here is how you could set up such an environment:

```python
import pulumi
import pulumi_gcp as gcp
import pulumi_kubernetes as kubernetes

# Set up a GKE cluster
cluster = gcp.container.Cluster("gke-cluster",
    initial_node_count=3,
    node_version="latest",
    min_master_version="latest")

# once the cluster is created, we can get its kubeconfig
kubeconfig = pulumi.Output.all(cluster.name, cluster.endpoint, cluster.master_auth).apply(
    lambda args: """
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: {0}
    server: https://{1}
  name: gke-cluster
contexts:
- context:
    cluster: gke-cluster
    user: gke-cluster
  name: gke-cluster
current-context: gke-cluster
kind: Config
preferences: {{}}
users:
- name: gke-cluster
  user:
    auth-provider:
      config:
        cmd-args: config config-helper --format=json
        cmd-path: gcloud
        expiry-key: '{{.credential.token_expiry}}'
        token-key: '{{.credential.access_token}}'
      name: gcp
""".format(args[2]["clusterCaCertificate"], args[1]))

# Create a Kubernetes provider instance using the kubeconfig obtained from the cluster
k8s_provider = kubernetes.Provider("gke-k8s", kubeconfig=kubeconfig)

# Use Kubernetes resources like PodDisruptionBudget to maintain reliability during disruptions
example_pdb = kubernetes.policy.v1beta1.PodDisruptionBudget("example-pdb",
    spec=kubernetes.policy.v1beta1.PodDisruptionBudgetSpecArgs(
        min_available=2,
        selector=kubernetes.meta.v1.LabelSelectorArgs(
            match_labels={"app": "my-app"}
        ),
    ),
    opts=pulumi.ResourceOptions(provider=k8s_provider))

# Support for PriorityLevelConfiguration in Kubernetes is optional and may need to be enabled on your cluster
example_plc = kubernetes.flowcontrol.v1alpha1.PriorityLevelConfiguration(
    "example-plc",
    metadata=kubernetes.meta.v1.ObjectMetaArgs(
        name="priority-level",
    ),
    spec=kubernetes.flowcontrol.v1alpha1.PriorityLevelConfigurationSpecArgs(
        type="Limited",
        limited=kubernetes.flowcontrol.v1alpha1.LimitedPriorityLevelConfigurationArgs(
            assured_concurrency_shares=10,
            limit_response=kubernetes.flowcontrol.v1alpha1.LimitResponseArgs(
                type="Queue",
                queuing=kubernetes.flowcontrol.v1alpha1.QueuingConfigurationArgs(
                    queues=1,
                    queue_length_limit=10,
                    hand_size=1
                )
            )
        )
    ),
    opts=pulumi.ResourceOptions(provider=k8s_provider))

# Export the kubeconfig to be used by external applications
pulumi.export('kubeconfig', kubeconfig)
```

**Explanation:**

- We start by defining a GKE cluster resource that consists of 3 nodes.
- Once the GKE cluster is provisioned, a kubeconfig for the cluster is generated which allows us to interact with our cluster using `kubectl` or other Kubernetes tools.
- We then create a Kubernetes Provider that will use the kubeconfig of the created GKE cluster. This is needed for Pulumi to interact with our Kubernetes cluster.
- With the Kubernetes provider set up, we can now declare our Kubernetes resources:
  - `PodDisruptionBudget` to ensure that there are always a minimum number of pods available during the node or pods maintenance.
  - `PriorityLevelConfiguration` to manage the concurrency levels and queuing for requests for the Kubernetes API server, which can help simulate different load conditions.
- Finally, we export the kubeconfig, which will be useful if you need to run `kubectl` commands against your cluster from your local machine.

Please note that the `PriorityLevelConfiguration` is an alpha feature and might not be available on every Kubernetes cluster. You should ensure that the specific features you intend to use are supported and enabled on your cluster.