Secure Serverless API Deployment for ML Models on Cloud Run

Question

Pulumi · Accepted Answer

To deploy a machine learning (ML) model as a secure serverless API on Google Cloud Run, you need to containerize your ML model and create a service that defines the characteristics of the deployment. When creating the service, you can specify settings such as the amount of memory each instance receives, the maximum number of instances, the concurrency level, and other settings to control auto-scaling.

Google Cloud Run is a managed platform that enables you to run stateless containers invocable via HTTP requests. It is a fully managed solution, meaning that it handles infrastructure management tasks such as provisioning, configuring, scaling, and managing servers.

Pulumi can be used to define, deploy, and manage Cloud Run services using infrastructure as code. The code below demonstrates how to deploy an ML model as a service with basic configuration, including creating a Cloud Run service with appropriate permissions and securing the API with Identity-Aware Proxy (IAP).

### Explanation and Pulumi Program

Firstly, you need a Dockerized ML model, which should expose a RESTful API (e.g., using Flask or FastAPI in Python). This API should be the entry point of your Docker container. Make sure that your Docker image is hosted on a container registry. For this example, we assume that the Docker image is available at `gcr.io/my-project/my-ml-model:latest`.

Next, we will write a Pulumi program to deploy this Docker image to Cloud Run. We will use `google-native` provider, which supports Google Cloud resources.

The `google_native.run.v2.Service` resource is used to define the Cloud Run service. In the service definition, you will notice we are using settings such as `location` to specify where the service is hosted, `template` to describe how the service should be configured, and `traffics` to control the roll-out traffic to new revisions.

IAM (Identity and Access Management) will be configured by `google_native.run.v2.ServiceIamBinding` to allow unauthenticated access to the service. Please note that for production use, you'd want to restrict this access according to your requirements and possibly use a more fine-grained access control system via Google Cloud IAM.

We will enable a stack export at the end of the code which will print out the URL that the service can be reached at upon successful deployment.

Now let's see the code which accomplishes the above:

```python
import pulumi
import pulumi_google_native as google_native

# Set the Google Cloud project and location for the Cloud Run service.
project = 'my-project'
location = 'us-central1'
service_name = 'my-ml-service'
image_url = 'gcr.io/my-project/my-ml-model:latest'

# Create a Cloud Run service.
ml_service = google_native.run.v2.Service(
    service_name,
    args=google_native.run.v2.ServiceArgs(
        project=project,
        location=location,
        name=service_name,
        template=google_native.run.v2.ServiceTemplateArgs(
            containers=[
                google_native.run.v2.ContainerArgs(
                    image=image_url,
                ),
            ],
            # Configure other aspects of the container such as memory, CPU, environment variables, etc.
        ),
        traffics=[
            google_native.run.v2.TrafficArgs(
                percent=100,
                type="all",
            ),
        ],
    ),
)

# Allow unauthenticated access to the service. In a production environment, you should restrict access appropriately.
iam_binding = google_native.run.v2.ServiceIamBinding(
    'service-iam-binding',
    args=google_native.run.v2.ServiceIamBindingArgs(
        service=ml_service.name.apply(lambda name: f"{project}/{location}/{name}"),
        role='roles/run.invoker',
        members=['allUsers'],
    ),
)

# Export the URL at which the service can be accessed.
pulumi.export('service_url', ml_service.statuses[0].url)
```

Make sure to replace `'my-project'` with your actual Google Cloud project ID and `'gcr.io/my-project/my-ml-model:latest'` with the path to your Docker image.

By executing this program with Pulumi, your ML model will be deployed to a secure and scalable environment with Google Cloud Run, and you'll get an endpoint URL where you can send requests to your ML API.

To run the program, ensure you have Google Cloud credentials configured, and Pulumi CLI installed. Then, you can execute the program using Pulumi's commands in the terminal:

```sh
pulumi up
```

This command will create a preview of the deployment and, upon confirmation, proceed to deploy the ML model to Cloud Run. Once the deployment is complete, the end-point URL will be displayed as an output of the stack.