AI Workload Audit Trails with GCP Logging

Question

Pulumi · Accepted Answer

To enable audit trails for AI workloads in Google Cloud Platform (GCP), we will use various logging services provided by GCP to monitor and store logs generated by your AI services and applications. Specifically, we'll work with the following Pulumi resources to set up audit trails with GCP Logging:

1. **ProjectSink**: This resource is used to create a sink that exports log entries matching a filter from all the logs within a particular project to a destination such as a PubSub topic, a BigQuery dataset, or a Cloud Storage bucket.

2. **LogMetric**: Metrics can be created from logs to provide greater insight into the frequency and characteristics of log events. It allows you to count instances of specific events or extract values from logged information to be monitored or alerted on.

3. **ProjectExclusion**: This resource is used to exclude specific log entries that match a filter so that they are not included in the sink export, which can help manage costs and reduce unnecessary noise in exported logs.

We'll set up a basic audit trail using these resources as follows:
- A log sink to export relevant audit logs to a BigQuery dataset for analysis.
- A log metric to count the occurrences of a certain event type in the logs.
- An exclusion to filter out log entries that are not needed.

Below is a Pulumi program written in Python which demonstrates how to set up this infrastructure. Please read through the comments to understand what each section is doing.

```python
import pulumi
import pulumi_gcp as gcp

# Before this code, you would have set up a GCP project and configured the Pulumi GCP provider to use it.

# Project Sink: Export relevant audit logs from your GCP project to a BigQuery dataset.
project_sink = gcp.logging.ProjectSink("my-project-sink",
    filter="logName:\"logs/cloudaudit.googleapis.com\" AND protoPayload.methodName:\"google.cloud.aiplatform.v1.PredictionService.Predict\"",
    destination="bigquery.googleapis.com/projects/my-project/datasets/my_dataset",
    # This option creates a separate table for each log entry type, which can be useful for varied log data.
    bigquery_options=gcp.logging.ProjectSinkBigqueryOptionsArgs(
        use_partitioned_tables=True
    )
    # Replace "my-project" with your project ID and "my_dataset" with your dataset name.
)

# Log Metric: Use a log-based metric to count the occurrences of predictions made by AI Platform services.
log_metric = gcp.logging.Metric("my-log-metric",
    description="Metric for counting AI Platform predictions",
    filter="resource.type=\"aiplatform.googleapis.com/Prediction\" AND protoPayload.status.message=\"Predict\"",
    metric_descriptor=gcp.logging.MetricMetricDescriptorArgs(
        metric_kind="DELTA",
        value_type="INT64"
    )
)

# Project Exclusion: Exclude health-check logs or other noisy log entries.
project_exclusion = gcp.logging.ProjectExclusion("my-project-exclusion",
    filter="resource.type=\"k8s_pod\" AND jsonPayload.health_check=true",
    # Set to "True" if the filter should not be applied currently but may be enabled later.
    disabled=False,
    description="Exclude k8s health checks from logs"
)
```

Here's what each section does:

- The `ProjectSink` is set up to match log entries from the AI Platform. You can adjust the `filter` based on the particular events you are interested in. The `destination` specifies a BigQuery dataset for further analysis.
- The `LogMetric` is used to measure the frequency of a specific event, in this case, AI predictions. The `filter` used here should match the log entries we want to count.
- The `ProjectExclusion` filters out unwanted log entries to focus on the relevant logs, saving costs and streamlining the logging data.

In this example, replace `"my-project"`, `"my_dataset"`, and the filter criteria with the appropriate values for your use case. This configuration will provide an audit trail for AI workloads which you can then analyze within BigQuery or use the log-based metrics to trigger alerts or further analysis.

After running this Pulumi code with `pulumi up`, it will establish the logging criteria and destinations specified in the code. Ensure you have the necessary permissions and that your Pulumi and GCP setup are correctly configured before running the program.