Serverless ML Model APIs with Cloud Run Service Accounts

Question

Pulumi · Accepted Answer

To deploy a serverless Machine Learning (ML) model on Google Cloud Run with appropriate service accounts, you would typically follow these steps:

1. **Containerize your ML model**: Build a container image that includes your ML model and a web server capable of handling API requests. This often involves writing a Dockerfile that specifies the environment, dependencies, and the command to start the server.

2. **Push the container image to Google Container Registry (GCR)**: Once your container image is built, push it to GCR so that Cloud Run can access it.

3. **Create a service account**: Service accounts are used in Google Cloud to provide identity to applications and VMs. These accounts can be given specific roles and permissions that control what resources the service account can access.

4. **Deploy to Cloud Run**: Create a Cloud Run service using the container image from GCR. Specify the service account to be used by the Cloud Run service.

5. **Expose the ML API**: Once deployed, Google Cloud Run will provide you with an endpoint URL. You can use this URL to interact with your ML model via HTTP requests.

These steps can be automated using infrastructure as code, specifically with Pulumi, which allows you to define resources in a programming language like Python.

Here's a Pulumi program in Python that accomplishes these tasks:

```python
import pulumi
import pulumi_gcp as gcp

# Replace 'gcr.io/my-project/my-model:v1' with your actual container image URL.
container_image_url = 'gcr.io/my-project/my-model:v1'

# Creating a service account for the Cloud Run service
service_account = gcp.serviceaccount.Account("ml-model-service-account",
    account_id="ml-model-service-account",
    display_name="ML Model Service Account")

# IAM role for Cloud Run to use the service account
iam_policy = gcp.cloudrun.IamPolicy("ml-model-service-account-iam",
    location="us-central1",
    project="my-project",
    service="my-cloud-run-service",
    bindings=[{
        "role": "roles/run.invoker",
        "members": ["serviceAccount:{}".format(service_account.email)],
    }])

# Deploying the container to Cloud Run
cloud_run_service = gcp.cloudrun.Service("ml-model-cloud-run-service",
    location="us-central1",
    template=gcp.cloudrun.ServiceTemplateArgs(
        spec=gcp.cloudrun.ServiceTemplateSpecArgs(
            containers=[gcp.cloudrun.ServiceTemplateSpecContainerArgs(
                image=container_image_url,
            )],
            service_account_name=service_account.email,
        ),
    ))

# Exposing the Cloud Run application
# This will give us the URL to invoke the ML model API
cloud_run_invoke_url = cloud_run_service.statuses.apply(lambda status: status[0].url)

# Export the URL so it can be easily accessed
pulumi.export("invoke_url", cloud_run_invoke_url)
```

Make sure you have Pulumi and GCP CLI configured properly on your machine. Replace `my-project`, `us-central1`, `my-cloud-run-service`, and the `container_image_url` with your actual project ID, location, cloud run service name, and container image URL.

This program creates a service account, assigns the necessary IAM role for Cloud Run to invoke the container with the right permissions, deploys it to Cloud Run, and exports the URL that you can use to access your ML API.