Batch Inferring Workloads Using GCP Cloud Scheduler

Question

Pulumi · Accepted Answer

To create a batch inferring workload using Google Cloud Scheduler on GCP, we would utilize several GCP services:

- **Cloud Scheduler**: To trigger jobs on a schedule. It allows you to schedule virtually any job, including batch, big data jobs, cloud infrastructure operations with the assurance that they'll run—even in the event of failure.
- **Cloud Functions**: To execute code in response to a trigger, which in this case will be the Cloud Scheduler. 
- **Pub/Sub**: As a messaging queue that will act as a middleman to take messages from Cloud Scheduler and pass them to Cloud Functions.
- **AI Platform or other services**: To conduct batch inferring. For this example, we'll assume that the infrastructure for inference is already in place and that the Cloud Function will reach out to the respective service to initiate the workload.

Here's a high-level overview of the process:

1. Cloud Scheduler triggers a job on your predefined schedule.
2. The job publishes a message to a Pub/Sub topic.
3. A Cloud Function is subscribed to this topic, and it executes when a message is published.
4. The Cloud Function triggers the inference workload (for which we assume there's an existing endpoint or mechanism).

Now, let's create a Pulumi program to set up Cloud Scheduler to trigger a job every day. Here, I'll explain step-by-step what it does and provide a Pulumi program written in Python:

```python
import pulumi
import pulumi_gcp as gcp

# Define a Pub/Sub topic where the Cloud Scheduler message will be sent
topic = gcp.pubsub.Topic("scheduler-topic")

# Define a Pub/Sub subscription to the previously defined topic
subscription = gcp.pubsub.Subscription("scheduler-subscription",
                                       topic=topic.name)

# Define the Cloud Scheduler job
# This job will trigger the Pub/Sub topic on a schedule.
# For this example, we're using a daily schedule. The schedule is in UNIX crontab format.
# The job will also include some arbitrary data that might be useful for the inference workload
job = gcp.cloudscheduler.Job("inferencing-job",
                             description="Trigger batch inferencing daily",
                             schedule="0 0 * * *",  # This means "at 00:00 (midnight) daily"
                             time_zone="Etc/UTC",   # Set the timezone accordingly
                             pubsub_target=gcp.cloudscheduler.JobPubsubTargetArgs(
                                 topic_name=topic.id,
                                 data="dGVzdC1tZXNzYWdl"  # Base64-encoded string "test-message"
                             ))

# Define a Cloud Function to process the message from Pub/Sub and trigger the inferencing workload
# For this example, we're going to use inline source code for simplicity, but in practice, you'd deploy from a local directory or a source repository
# The actual submission of the workload isn't included here, as it's assumed the logic to do that is in place
cloud_function_source = """
def hello_pubsub(event, context):
    '''Triggered from a message on a Cloud Pub/Sub topic.
    Args:
         event (dict): Event payload.
         context (google.cloud.functions.Context): Metadata for the event.
    '''
    import base64
    import json

if 'data' in event:
        message_str = base64.b64decode(event['data']).decode('utf-8')
        message = json.loads(message_str)

# Here you would add the code to trigger the batch workload.
        # This could be a call to AI Platform or another service that runs the inferencing job.

print(f'Inferencing job triggered with message: {message}')
"""

# Boilerplate code for deploying a Cloud Function is not shown here, it would typically involve creating a zip file of your source,
# uploading it to a GCP bucket, and then referencing that as the source for the Cloud Function resource.
# You would also configure the function's trigger to be the Pub/Sub topic we defined earlier.

# Due to the complexity of deploying Cloud Functions via Pulumi and to keep this example concise and focused on the Cloud Scheduler part,
# please refer to the Pulumi documentation for Google Cloud Functions to implement the actual deployment:
# https://www.pulumi.com/registry/packages/gcp/api-docs/cloudfunctions/function/

# pulumi.export('scheduler_topic', topic.name)
# pulumi.export('scheduler_subscription', subscription.name)
# pulumi.export('inferencing_job_name', job.name)

```

Here's what each part of the program does:

- **Pub/Sub Topic**: Created as a medium for the Cloud Scheduler to send trigger messages.
- **Pub/Sub Subscription**: Subscribes to the topic to collect messages and pass to the downstream service (e.g., Cloud Function).
- **Cloud Scheduler Job**: Defines the job that will send a message to the Pub/Sub topic on a scheduled basis.
- **Cloud Function**: It's implied in the code, but not fully written out, that a Cloud Function will be listening to the messages being sent to the Subscription and processing them to kick off the batch workload.

The `data` parameter in the Cloud Scheduler Job definition is base64-encoded. This is because the Cloud Scheduler sends binary data, and this example just sends a test string. In real-world usage, you would encode whatever data is necessary for the batch process.

The time zone and schedule should be adjusted to meet the needs of your batch workload timing requirements.

Lastly, Pulumi uses an `export` feature to output names or identifiers of created resources, which is commented out at the bottom of the example. These exports are handy when you need to use the output elsewhere, or for just verifying the deployment.

For deploying Cloud Functions with Pulumi, you typically have to package your function code and upload it to a GCP bucket. The actual Pulumi resource would reference this code. The packaging and uploading processes are not shown here in detail but are an important part of deploying a Cloud Function.

To implement the Cloud Function, follow detailed guides and reference the [Pulumi registry documentation for Cloud Functions](https://www.pulumi.com/registry/packages/gcp/api-docs/cloudfunctions/function/).

This program creates the necessary infrastructure to trigger batch workloads at scheduled intervals on GCP using Pulumi as a way to define this infrastructure as code.