Scheduled Model Inference Jobs on Kubernetes

Question

Pulumi · Accepted Answer

To create scheduled model inference jobs on Kubernetes, we will use a `CronJob` resource. A `CronJob` creates Jobs on a time-based schedule, which is perfect for running recurring tasks such as model inference.

Here's how you can achieve this with Pulumi and Kubernetes:

1. **Define the cron schedule**: The schedule follows the standard cron format. For example, `"*/5 * * * *"` would represent every 5 minutes.
2. **Create a Job Template**: This template specifies the Pods that will be created. It should contain the details about the container, including the image to use and the command to run.
3. **Configure the CronJob Spec**: This includes the job template, the schedule, and other settings such as `concurrencyPolicy`, which determines how to treat concurrent jobs, and `successfulJobsHistoryLimit` which specifies how many completed jobs should be kept.

Now, let's write a Pulumi program that defines a `CronJob` for model inference using a container image:

```python
import pulumi
import pulumi_kubernetes as k8s

# Define the container to use for the job.
# Replace 'your-container-image' with the actual image you intend to use and
# provide the necessary commands to carry out the model inference.
container = k8s.core.v1.ContainerArgs(
    name="model-inference",
    image="your-container-image",
    command=["/bin/sh", "-c", "command-to-run-model-inference"]
)

# Define the job spec that will be used by the CronJob.
job_spec = k8s.batch.v1.JobSpecArgs(
    template=k8s.core.v1.PodTemplateSpecArgs(
        spec=k8s.core.v1.PodSpecArgs(
            containers=[container],
            restart_policy="OnFailure",  # Restart policy for all containers within the pod
        )
    )
)

# Define the CronJob resource, establishing the schedule and the job template.
model_inference_cronjob = k8s.batch.v1.CronJob(
    "model-inference-cronjob",
    metadata=k8s.meta.v1.ObjectMetaArgs(name="model-inference"),
    spec=k8s.batch.v1.CronJobSpecArgs(
        schedule="0 */1 * * *",  # This will run the job at the top of every hour.
        job_template=k8s.batch.v1.JobTemplateSpecArgs(
            spec=job_spec
        ),
        # Additional optional configuration:
        # start_deadline_seconds specifies the deadline in seconds for starting the job if it misses its scheduled time for any reason.
        # concurrency_policy specifies how to treat concurrent executions of a Job.
        # successful_jobs_history_limit specifies how many completed jobs should be kept.
    )
)

# Export the name of the cronjob
pulumi.export('cronjob_name', model_inference_cronjob.metadata["name"])
```

This program creates a Kubernetes `CronJob` resource using Pulumi. It runs a specified container at the top of every hour. The container should contain your model inference code. Replace `your-container-image` with the Docker image you will use and provide the command that runs your model inference.

Please adjust the `schedule` to fit the frequency you need. The `restart_policy` is set to `"OnFailure"` to restart the job only if it fails. You might also want to adjust `concurrencyPolicy` and `successfulJobsHistoryLimit` based on how you want to handle concurrent jobs and how many successful job logs you want to retain, respectively. The `start_deadline_seconds` can also be set to indicate the latest time to start a job.

Remember to point `image` to your container image and replace `"command-to-run-model-inference"` with the actual command that triggers the inference in your container.

After you deploy this Pulumi program with the Pulumi CLI, it will create the CronJob in your Kubernetes cluster, and the inference job will run at the scheduled times.