Monitoring ML Model Serving on Kubernetes with Prometheus

Question

Pulumi · Accepted Answer

To monitor Machine Learning (ML) model serving on Kubernetes with Prometheus, you'll want to set up the following components:

1. **Kubernetes Cluster**: This serves as the environment for your ML model. You'll need access to a Kubernetes cluster.
2. **ML Model Serving Application**: The application will need to be containerized and deployed to your Kubernetes cluster. It usually includes a web server that exposes an API for model inference.
3. **Prometheus**: This is an open-source systems monitoring and alerting toolkit. Kubernetes has good support for Prometheus. You can use Prometheus to scrape metrics from your ML model serving application.
4. **Prometheus Operator**: This is a way of managing the lifecycle of Prometheus and related monitoring components on Kubernetes.
5. **Service Monitors/ Pod Monitors**: These are custom resources provided by the Prometheus Operator to specify how groups of Kubernetes services or pods should be monitored.
6. **Grafana (Optional)**: An open-source platform for monitoring and observability, Grafana allows you to query, visualize, alert on, and understand your metrics.

Here is how you can use Pulumi to create these components:

- **Kubernetes Cluster**: You can use the Pulumi Kubernetes provider to provision a cluster or use an existing one.
- **Prometheus Operator**: Use Helm to deploy the Prometheus Operator on your Kubernetes cluster.
- **Service/Pod Monitors**: Define these custom resources to instruct Prometheus on what endpoints to scrape.
  
Below is a sample Pulumi program to set up Prometheus on a Kubernetes cluster:

```python
import pulumi
import pulumi_kubernetes as kubernetes
from pulumi_kubernetes.helm.v3 import Chart, ChartOpts

# Create a Kubernetes provider instance based on an existing context `demo`, for instance.
k8s_provider = kubernetes.Provider('k8s-provider', kubeconfig='~/.kube/config-demo')

# Deploy the Prometheus Operator with Helm.
# This will take care of installing the Prometheus server, Alertmanager, and related components.
prometheus_operator = Chart(
    'prometheus-operator',
    ChartOpts(
        chart='kube-prometheus-stack',
        version='13.13.1',
        fetch_opts=kubernetes.helm.v3.FetchOpts(
            repo='https://prometheus-community.github.io/helm-charts'
        ),
    ),
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Assuming your ML model serving application is already deployed and has a service exposing metrics,
# you can define a ServiceMonitor to scrape metrics from that service.
service_monitor = kubernetes.apiextensions.CustomResource(
    'ml-model-service-monitor',
    api_version='monitoring.coreos.com/v1',
    kind='ServiceMonitor',
    metadata={'namespace': 'default'},  # Ensure this is the namespace where your service resides
    spec={
        'selector': {
            'matchLabels': {
                'app': 'ml-model-serving',  # Replace with the actual labels of your application
            }
        },
        'endpoints': [{
            'port': 'metrics',  # The port defined in your service to expose metrics
        }],
    },
    opts=pulumi.ResourceOptions(provider=k8s_provider, depends_on=[prometheus_operator])
)

# Export the endpoint of Prometheus to access the Prometheus Expression Browser & API.
prometheus_endpoint = pulumi.Output.all(prometheus_operator.get_resource('v1/Service', 'prometheus-operator-kube-p-prometheus')).apply(
    lambda resources: resources[0].status.load_balancer.ingress[0].hostname if resources else None
)

pulumi.export('prometheus_endpoint', prometheus_endpoint)
```

This script performs the following steps:

1. It sets up a Kubernetes provider associated with your cluster.
2. It then uses Helm to deploy the `kube-prometheus-stack`, which includes Prometheus, Alertmanager, Grafana, and other monitoring components.
3. Next, it sets up a `ServiceMonitor` custom resource that Prometheus discovers. This resource specifies your model serving service as the target to scrape metrics from.
4. Finally, it exports the endpoint of Prometheus so that you can access the Prometheus Expression Browser and API.

This Pulumi program assumes that you have an ML model serving application already deployed to Kubernetes and that it exposes a metrics endpoint (`/metrics` is common). If not, you will need to containerize your ML model serving application with something like Flask for Python, and include code for Prometheus metrics exporting, using a client library like `prometheus_client` in Python. Then, deploy it to Kubernetes and ensure it has the correct labels that match the `ServiceMonitor` selector in the script.

After running this script with Pulumi, your ML model serving will be monitored by Prometheus, and you can add a Grafana dashboard to visualize these metrics or set up alerts based on the metrics.