Access Control for Data Processing Jobs using GCP IAM Workload Identity Providers

Question

Pulumi · Accepted Answer

Access control is an essential aspect of cloud resource management, ensuring that only authorized users or services can perform operations on your resources. In the context of Google Cloud Platform (GCP), access control for data processing jobs can be managed using Identity and Access Management (IAM). Specifically, you can use IAM Workload Identity Providers to allow workloads running in one cloud provider to impersonate a service account in GCP. This is useful for scenarios where you are running workloads on a different cloud provider (e.g., AWS or Azure) or on-premises, but you need to access GCP resources.

Below is a Pulumi program that demonstrates how to set up a Workload Identity Provider in GCP to facilitate cross-cloud access control for data processing jobs.

Firstly, you'll need to set up a Workload Identity Pool, which acts as a container for Workload Identity Providers. Then, you create a Workload Identity Provider within that pool. To allow AWS workloads to assume a GCP service account identity, for instance, you'll configure the Workload Identity Provider with the details of the AWS account.

The program is explained in detail through comments:

```python
import pulumi
import pulumi_gcp as gcp

# Create a new identity pool. The identity pool is a container for identity providers.
identity_pool = gcp.iam.WorkloadIdentityPool("my-identity-pool",
    display_name="My Identity Pool for Data Jobs",
    disabled=False)

# Create a workload identity provider for an AWS account.
# This provider will allow AWS workloads to impersonate a GCP service account.
workload_identity_provider = gcp.iam.WorkloadIdentityPoolProvider("my-workload-identity-provider",
    workload_identity_pool_id=identity_pool.id,
    display_name="My Workload Identity Provider for AWS",
    description="This Workload Identity Provider allows workloads in AWS to access GCP resources",
    aws={ # Configure the AWS account details to allow impersonation
        "accountId": "123456789012", # Replace with your AWS Account ID
    },
    attribute_mapping={
        "google.subject": "assertion.aws_account_id", # Mapping the AWS account ID to the corresponding subject in Google
        "attribute.actor": "assertion.iam_role_arn", # Mapping the AWS role ARN for the identity being assumed
    })

# Export the identity pool ID and the provider name, which might be needed for other operations (like updating IAM policies)
pulumi.export('identity_pool_id', identity_pool.id)
pulumi.export('workload_identity_provider_name', workload_identity_provider.name)
```

In this program, we create an IAM Workload Identity Pool and a Workload Identity Pool Provider for AWS. You use the provider to permit applications running on AWS to authenticate with a Google Service Account and thereby temporarily access Google Cloud resources in a secure manner.

Next, are a few steps that you would typically follow after creating these resources:
1. **IAM Policy Update**: You will update the service account's IAM policy to grant the `roles/iam.workloadIdentityUser` role to the workload identity provider, thus allowing it to assume the identity of the service account.
2. **Service Account Impersonation**: On AWS, you will configure your workload to authenticate using the IAM role that you specified when setting up the identity provider. GCP will grant the workload temporary credentials to act as the service account.
3. **Resource Access**: Your data processing jobs on AWS can now use these temporary GCP credentials to interact with GCP resources like Pub/Sub, GCS, or BigQuery, as if they were the service account.

For advanced use-cases, further fine-grained policies can be applied, or additional identity providers like OIDC can be incorporated in a similar manner, depending on the workload's originating platform.

Remember to replace placeholders such as `123456789012` with your actual AWS Account ID and adjust the attribute mappings as per your use case.