Serverless ML Inference with GitLab Runner on Cloud Functions

Handling Serverless ML Inference with GitLab and GCP Cloud Functions

Serverless architecture and machine learning (ML) inference are potent combinations for scaling workloads efficiently. Using Cloud Functions allows you to run inference code without managing servers, and GitLab Runners provide a way to automate the execution of ML inference jobs, harnessing the scalability of cloud functions. Here's what we need to set up:

GitLab Runner: This is a standalone application that works with GitLab CI/CD to run jobs in a pipeline. The GitLab Runner will need to be registered with your GitLab instance.
Cloud Functions: These are the actual serverless functions that will execute the ML inference code. On Google Cloud, these can be set up with the gcp.cloudfunctions.Function resource.

In this example, we'll set up a GitLab Runner on Google Cloud and deploy an ML inference function as a Cloud Function. We'll assume you have the GitLab Runner Docker image ready for deployment and have a GitLab project where the CI/CD pipeline will be configured to trigger ML inference jobs on the cloud function.

Please ensure that you have the necessary permissions and configurations set up in your GitLab project, Google Cloud Platform, and Pulumi before proceeding.

Let's start building the infrastructure with the Pulumi Python program:

import pulumi
import pulumi_gcp as gcp

# Initialize GCP configuration
project = gcp.config.project
region = gcp.config.region

# Define the Cloud Function for ML inference
ml_inference_function = gcp.cloudfunctions.Function(
    "ml-inference-function",
    runtime='python37',  # Choose the runtime that suits your ML model's environment
    entry_point='handler',  # The name of the function handler
    source_repository=gcp.cloudfunctions.FunctionSourceRepositoryArgs(
        url='https://source.developers.google.com/projects/your-project/repos/your-repo-name/moveable-aliases/main/paths/inference'  # URL of the repo where your ML inference code is stored
    ),
    trigger_http=True,  # Make the function triggerable via HTTP requests
    region=region,
    project=project,
    labels={
        "function": "ml-inference"
    }
)

# Output the https trigger URL
pulumi.export("function_https_trigger_url", ml_inference_function.https_trigger_url)

# Define GitLab Runner on Google Cloud
gitlab_runner = gcp.compute.Instance(
    "gitlab-runner",
    machine_type='e2-small',  # You can choose a machine type that fits your requirements
    boot_disk=gcp.compute.InstanceBootDiskArgs(
        initialize_params=gcp.compute.InstanceBootDiskInitializeParamsArgs(
            image='projects/ubuntu-os-cloud/global/images/family/ubuntu-1804-lts'  # Use the Ubuntu image for GitLab Runner
        )
    ),
    network_interfaces=[gcp.compute.InstanceNetworkInterfaceArgs(
        network='default',
        access_configs=[gcp.compute.InstanceNetworkInterfaceAccessConfigArgs()]
    )],
    metadata_startup_script="""#!/bin/bash
    # Install Docker
    apt-get update
    apt-get install -y docker.io
    # Register GitLab Runner
    docker run -d --name gitlab-runner --restart always \
        -v /srv/gitlab-runner/config:/etc/gitlab-runner \
        -v /var/run/docker.sock:/var/run/docker.sock \
        gitlab/gitlab-runner:latest
    docker exec -it gitlab-runner gitlab-runner register \
        --url 'https://gitlab.com/' \
        --registration-token 'Your_Gitlab_Registration_Token' \
        --executor 'docker' \
        --docker-image 'docker:19.03.12' \
        --description 'serverless-ml-inference' \
        --tag-list 'ml,inference' \
        --run-untagged='true' \
        --locked='false'
    """
)

# Output the GitLab Runner instance details
pulumi.export("gitlab_runner_name", gitlab_runner.name)
pulumi.export("gitlab_runner_zone", gitlab_runner.zone)

This Pulumi program does the following:

It creates a Google Cloud Function ml-inference-function that acts as the ML inference service. You must edit the source_repository attribute to point to the repository holding your ML inference code.
It sets the Cloud Function to trigger via HTTP requests, and outputs the HTTPS trigger URL, which you can use to send inference requests.
It provisions a GCP Compute Engine instance named gitlab-runner which automatically installs Docker and registers itself as a GitLab Runner.

Remember to replace placeholders like your-project, your-repo-name, and Your_Gitlab_Registration_Token with appropriate values for your GitLab and GCP setup. Also, configure the runner to match your preferred region and machine type depending on the workload.

Finally, the metadata_startup_script within the Compute Engine instance installation provisions the GitLab Runner using a startup script that installs Docker and registers the runner with GitLab using your registration token. Adjust the Docker image used by the GitLab Runner according to your needs.

The program will output the GitLab Runner instance details and the HTTPS URL of the Cloud Function, which will be the endpoint for triggering ML inference jobs.

Bear in mind that this is a simple setup and you might need to manage permissions and network configurations more securely for a production environment.