Automated Batch Prediction Jobs with Cloud Scheduler
PythonTo automate batch prediction jobs using Google Cloud Scheduler, you need to create a job that triggers your prediction code at a specified frequency. The prediction code itself could be hosted in various forms such as a Google Cloud Function, a container on Google Kubernetes Engine, or an application running on a Compute Engine instance, among others.
For the sake of this example, let's assume that your batch prediction code is deployed as a Cloud Function. You want to schedule this function to run every day at a specific time.
Here's what we need to do:
- Create a Google Cloud Function with the batch prediction logic.
- Set up a Cloud Scheduler job that triggers the function on a regular schedule.
Below is a Pulumi program in Python that sets up such a scenario:
import pulumi import pulumi_gcp as gcp # First, we'll create a Google Cloud Function that contains our batch prediction logic. # Replace the 'source_archive_bucket' and 'source_archive_object' with the location of your Cloud Function code. batch_prediction_function = gcp.cloudfunctions.Function("batchPredictionFunction", entry_point="YOUR_FUNCTION_ENTRYPOINT", # The name of the entry point function in your code runtime="python39", # Adjust the runtime as needed for your function source_archive_bucket="YOUR_SOURCE_ARCHIVE_BUCKET", source_archive_object="YOUR_SOURCE_ARCHIVE_OBJECT", trigger_http=True, # Indicates that the function can be triggered via HTTP requests project="YOUR_GCP_PROJECT_ID", # Replace with your GCP Project ID region="YOUR_FUNCTION_REGION") # Replace with the region of your Cloud Function # After the function is deployed, it has an HTTPS trigger which we will use in our Cloud Scheduler Job. # The URL is not available until the function is created, so we must use an 'apply' to extract it. function_url = batch_prediction_function.https_trigger_url.apply(lambda url: url) # Second, we'll create a Cloud Scheduler job that calls the Cloud Function at a regular interval. # In this example, we run the job everyday at midnight ('0 0 * * *'). # Adjust the schedule as needed in your use case. batch_prediction_job = gcp.cloudscheduler.Job("batchPredictionJob", description="Daily batch prediction job", schedule="0 0 * * *", # Run at 00:00 (midnight) every day time_zone="UTC", # Set the timezone for the scheduler http_target=gcp.cloudscheduler.JobHttpTargetArgs( uri=function_url, http_method="GET" ), project="YOUR_GCP_PROJECT_ID", # Replace with your GCP Project ID region="YOUR_FUNCTION_REGION") # Replace with the region of your Cloud Scheduler Job # Export the function URL and the name of the scheduler job for reference pulumi.export('batch_prediction_function_url', function_url) pulumi.export('batch_prediction_job_name', batch_prediction_job.name)
In this code,
batchPredictionFunction
represents a Google Cloud Function resource which needs your code to be packed as a zip and uploaded to a GCS bucket (YOUR_SOURCE_ARCHIVE_BUCKET
). The zip file should be specified inYOUR_SOURCE_ARCHIVE_OBJECT
.The
batchPredictionJob
is a Cloud Scheduler Job resource which will invoke the Google Cloud Function at a schedule defined by you. In this example, it's set to trigger every day at midnight UTC. You can customize theschedule
string using the standard cron format to suit your timing needs.Make sure to replace
YOUR_FUNCTION_ENTRYPOINT
,YOUR_SOURCE_ARCHIVE_BUCKET
,YOUR_SOURCE_ARCHIVE_OBJECT
,YOUR_GCP_PROJECT_ID
, andYOUR_FUNCTION_REGION
with values that correspond to your setup.The
function_url
is the HTTPS URL that triggers the function. Since the URL is available only after the function is created, we use theapply
method to retrieve it once it’s available. This URL is then used in the Cloud Scheduler job as the HTTP target.The
pulumi.export
lines at the bottom output the function URL and scheduler job name so you can reference them outside of Pulumi, such as in your CI/CD system.This program sets up an automated pipeline where Google Cloud Functions is used for running Prediction Jobs, and Google Cloud Scheduler for automating them to run on a schedule, offering a fully serverless and automated solution.