1. Serverless Inference Endpoints with Cloud Run

    Python

    To create serverless endpoints for inference with Google Cloud Run, you would typically need to do the following:

    1. Deploy a Container: Package your machine learning model and inference code into a container and deploy it to Cloud Run. Google Cloud Run is a fully managed compute platform that automatically scales your stateless containers.

    2. Expose Endpoints: Once deployed, Cloud Run will provide you with a URL to access your service.

    3. Manage Permissions: Optionally, you may need to manage permissions for your Cloud Run service to control who can invoke your endpoints.

    For this program, we will:

    • Use the google-native.run/v1.Service resource because it allows us to deploy a new service to Cloud Run.
    • We will assume you have a Docker container image ready with your inference code and model, which is hosted on Google Container Registry or any other container image registry that Cloud Run can access.

    Here's a Pulumi program for deploying a serverless inference endpoint using Cloud Run:

    import pulumi import pulumi_google_native as google_native # Configuration for the Cloud Run service project = 'my-gcp-project' location = 'us-central1' image = 'gcr.io/my-gcp-project/my-inference-image' # Replace with your container image service_name = 'my-inference-service' # Create a Cloud Run service inference_service = google_native.run.v1.Service( service_name, metadata=google_native.run.v1.ObjectMetaArgs( name=service_name, namespace=project, ), spec=google_native.run.v1.ServiceSpecArgs( template=google_native.run.v1.RevisionTemplateArgs( spec=google_native.run.v1.RevisionSpecArgs( containers=[google_native.run.v1.ContainerArgs( image=image, )], ), ), ), location=location, project=project, ) # Export the URL of the service pulumi.export('service_url', inference_service.status.url)

    In this program:

    • We import the required Pulumi modules for Google Cloud.
    • We define the configuration, such as the project ID, location, image for the inference server, and the service name.
    • We create a Cloud Run service by defining a Service object with the necessary metadata and specifications, including the container image to deploy.
    • Lastly, we export the URL of the deployed service, which can be used to make inference requests.

    After running pulumi up with this program, the output will include the service URL. You can then send HTTP requests to this URL to perform inference using your machine learning model.