Serverless ML Inference with GitLab Runner on Cloud Functions
PythonServerless architecture and machine learning (ML) inference are potent combinations for scaling workloads efficiently. Using Cloud Functions allows you to run inference code without managing servers, and GitLab Runners provide a way to automate the execution of ML inference jobs, harnessing the scalability of cloud functions. Here's what we need to set up:
- GitLab Runner: This is a standalone application that works with GitLab CI/CD to run jobs in a pipeline. The GitLab Runner will need to be registered with your GitLab instance.
- Cloud Functions: These are the actual serverless functions that will execute the ML inference code. On Google Cloud, these can be set up with the
gcp.cloudfunctions.Function
resource.
In this example, we'll set up a GitLab Runner on Google Cloud and deploy an ML inference function as a Cloud Function. We'll assume you have the GitLab Runner Docker image ready for deployment and have a GitLab project where the CI/CD pipeline will be configured to trigger ML inference jobs on the cloud function.
Please ensure that you have the necessary permissions and configurations set up in your GitLab project, Google Cloud Platform, and Pulumi before proceeding.
Let's start building the infrastructure with the Pulumi Python program:
import pulumi import pulumi_gcp as gcp # Initialize GCP configuration project = gcp.config.project region = gcp.config.region # Define the Cloud Function for ML inference ml_inference_function = gcp.cloudfunctions.Function( "ml-inference-function", runtime='python37', # Choose the runtime that suits your ML model's environment entry_point='handler', # The name of the function handler source_repository=gcp.cloudfunctions.FunctionSourceRepositoryArgs( url='https://source.developers.google.com/projects/your-project/repos/your-repo-name/moveable-aliases/main/paths/inference' # URL of the repo where your ML inference code is stored ), trigger_http=True, # Make the function triggerable via HTTP requests region=region, project=project, labels={ "function": "ml-inference" } ) # Output the https trigger URL pulumi.export("function_https_trigger_url", ml_inference_function.https_trigger_url) # Define GitLab Runner on Google Cloud gitlab_runner = gcp.compute.Instance( "gitlab-runner", machine_type='e2-small', # You can choose a machine type that fits your requirements boot_disk=gcp.compute.InstanceBootDiskArgs( initialize_params=gcp.compute.InstanceBootDiskInitializeParamsArgs( image='projects/ubuntu-os-cloud/global/images/family/ubuntu-1804-lts' # Use the Ubuntu image for GitLab Runner ) ), network_interfaces=[gcp.compute.InstanceNetworkInterfaceArgs( network='default', access_configs=[gcp.compute.InstanceNetworkInterfaceAccessConfigArgs()] )], metadata_startup_script="""#!/bin/bash # Install Docker apt-get update apt-get install -y docker.io # Register GitLab Runner docker run -d --name gitlab-runner --restart always \ -v /srv/gitlab-runner/config:/etc/gitlab-runner \ -v /var/run/docker.sock:/var/run/docker.sock \ gitlab/gitlab-runner:latest docker exec -it gitlab-runner gitlab-runner register \ --url 'https://gitlab.com/' \ --registration-token 'Your_Gitlab_Registration_Token' \ --executor 'docker' \ --docker-image 'docker:19.03.12' \ --description 'serverless-ml-inference' \ --tag-list 'ml,inference' \ --run-untagged='true' \ --locked='false' """ ) # Output the GitLab Runner instance details pulumi.export("gitlab_runner_name", gitlab_runner.name) pulumi.export("gitlab_runner_zone", gitlab_runner.zone)
This Pulumi program does the following:
- It creates a Google Cloud Function
ml-inference-function
that acts as the ML inference service. You must edit thesource_repository
attribute to point to the repository holding your ML inference code. - It sets the Cloud Function to trigger via HTTP requests, and outputs the HTTPS trigger URL, which you can use to send inference requests.
- It provisions a GCP Compute Engine instance named
gitlab-runner
which automatically installs Docker and registers itself as a GitLab Runner.
Remember to replace placeholders like
your-project
,your-repo-name
, andYour_Gitlab_Registration_Token
with appropriate values for your GitLab and GCP setup. Also, configure the runner to match your preferred region and machine type depending on the workload.Finally, the
metadata_startup_script
within the Compute Engine instance installation provisions the GitLab Runner using a startup script that installs Docker and registers the runner with GitLab using your registration token. Adjust the Docker image used by the GitLab Runner according to your needs.The program will output the GitLab Runner instance details and the HTTPS URL of the Cloud Function, which will be the endpoint for triggering ML inference jobs.
Bear in mind that this is a simple setup and you might need to manage permissions and network configurations more securely for a production environment.