Serverless Machine Learning Inference with OCI Functions on Kubernetes

Question

Pulumi · Accepted Answer

Serverless computing is a cloud computing execution model where the cloud provider manages the infrastructure and dynamically allocates resources based on incoming requests. Oracle Cloud Infrastructure (OCI) Functions is a serverless platform that lets you run code without managing infrastructure. It's typically used for executing short-lived functions in response to triggers such as HTTP requests.

To create a serverless machine learning inference on Kubernetes with OCI, you will use several resources:

1. **OCI Functions Application**: A logical grouping of functions within OCI. This is akin to a project that holds related functions together.
2. **OCI Functions Function**: Each function within the application is a piece of code that performs a specific task. This is where you'd deploy your machine learning model inference code.
3. **OCI ApiGateway**: To expose the function via HTTP so that it can be triggered over the internet, using an API gateway is the standard approach.

Here's a Python program that automates the setup of these resources using Pulumi. This program does not deploy a specific machine learning model but sets up the infrastructure required to deploy such a model. You would need to provide the Docker image containing your serverless function code and any additional configurations specific to your use case.

```python
import pulumi_oci as oci

# Configuration variables for the function application and function deployment
compartment_id = 'ocid1.compartment.oc1..your_compartment_id'  # Replace with your compartment OCID
image_uri = 'your_image_uri'  # URI of the container image in OCI Registry or other registries
function_memory_in_mbs = 128  # Amount of memory in megabytes allocated to your function

# Create an Application on OCI Functions service
app = oci.functions.Application("app",
    compartment_id=compartment_id,
    display_name="my-functions-app",
    subnet_ids=["subnet_ocid1", "subnet_ocid2"],  # List of subnet OCIDs for the application
)

# Deploy a Function within the created Application
func = oci.functions.Function("func",
    application_id=app.id,
    display_name="my-model-inference-function",
    image=image_uri,
    memory_in_mbs=function_memory_in_mbs,
    timeout_in_seconds=30,  # Max function execution time
    config={
        # Environment variables can be provided here
        "MODEL_URL": "oci://bucket_name@namespace/path/to/model",
    },
)

# Exposing the function via API Gateway
# First, create the API Gateway
api_gw = oci.apigateway.Gateway("api_gw",
    compartment_id=compartment_id,
    display_name="my-api-gateway",
    is_enabled=True,
    subnet_id="subnet_ocid3",  # Replace with your Subnet OCID
)

# Define an API deployment that will route to our function
api_deployment = oci.apigateway.Deployment("api_deployment",
    compartment_id=compartment_id,
    display_name="my-api-deployment",
    gateway_id=api_gw.id,
    path_prefix="/infer",
    specification=open("api_specification.json").read(),  # API spec defined in Swagger or OpenAPI format
)

# Export the function and API Gateway URLs
pulumi.export('function_ocid', func.id)  # The OCID of the function to invoke it using the OCI SDK or CLI
pulumi.export('api_endpoint', api_deployment.endpoint_url)  # The URL to invoke the function via the API Gateway
```

In the above program:
- We create an application to hold our functions using `oci.functions.Application`.
- We define a function with a docker image containing our ML model and deployment-specific configurations.
- We create an API Gateway and define a deployment with a path prefix `/infer` that forwards requests to our function.

The function's configuration, including memory allocation, timeout, and environment variables like `MODEL_URL`, will depend on your specific model and use case.

Note that this code assumes that you have already:
- Created the necessary compartments (`compartment_id`) in OCI.
- Set up the necessary networking, including subnets.
- Pushed a Docker image (`image_uri`) that contains your machine learning inference code to OCI Registry or another container registry.

Remember to replace placeholder values like `ocid1.compartment.oc1..your_compartment_id`, `your_image_uri`, `subnet_ocid1`, etc., with actual values from your OCI setup.

The `api_specification.json` file should contain the API specification in OpenAPI (formerly Swagger) format that defines how the API Gateway will handle requests to your function.

This program sets the foundation for running serverless machine learning inference on OCI using Pulumi to automate resource provisioning and deployment.