Controlling Access to OCI Object Storage for AI Datasets

Question

Pulumi · Accepted Answer

To control access to an Oracle Cloud Infrastructure (OCI) Object Storage bucket for AI datasets, you'll typically need to:

1. Create an OCI Object Storage bucket.
2. Apply access control policies to regulate who and what can access your AI datasets stored in that bucket.

Below we will write a Pulumi program to create a new OCI Object Storage bucket and control access to it. This program assumes you have the Oracle Cloud Infrastructure (OCI) provider set up and configured with the appropriate credentials to create and manage resources in your OCI account.

We will create a storage bucket using the `oci.ObjectStorage.Bucket` resource. To control access, OCI offers various features such as predefined roles, policy attachments, and bucket-level access policies. As of the Pulumi OCI provider's capabilities at the time of our information cut-off, detailed access control rules such as creating custom IAM roles are typically configured outside of Pulumi, via the OCI console or CLI directly. Nonetheless, we can show you how to create a bucket with access type settings using Pulumi.

Here's a simple program that creates an OCI Object Storage bucket with private access type, meaning the content is only accessible to the bucket owner or others based on the policies that you would define and attach separately:

```python
import pulumi
import pulumi_oci as oci

# Specify the compartment ID where you want to create the bucket
compartment_id = "ocid1.compartment.oc1..exampleuniqueID"

# Create an Object Storage bucket with the 'private' access type
ai_dataset_bucket = oci.ObjectStorage.Bucket("aiDatasetBucket",
    compartment_id=compartment_id,
    namespace="your_namespace", # Replace with your object storage namespace
    name="ai-datasets-bucket",
    access_type="ObjectRead",
    storage_tier="Standard",
    versioning="Enabled",
    auto_tiering="Disabled"
)

# Export the name and the URI of the bucket
pulumi.export("bucket_name", ai_dataset_bucket.name)
pulumi.export("bucket_uri", ai_dataset_bucket.id.apply(lambda id: f"oci://{id}"))
```

In this program:

- We create a new bucket called `ai-datasets-bucket`.
- We set the `access_type` to `ObjectRead`, which means objects in the bucket can be read (downloaded) but not listed by the public.
- `versioning` is enabled to keep a history of object changes, which can be helpful for datasets that evolve over time.
- `storage_tier` is set to `Standard`, which is a good default for data that needs to be accessed frequently.
- The compulsory arguments like `name`, `compartment_id`, and `namespace` are specified. You would replace these placeholder values with your actual OCI namespace and the compartment ID where you want to create the resources.

Please replace `"your_namespace"` with the actual namespace from your OCI Object Storage. Also, ensure you have provided the `compartment_id` that aligns with your OCI account structure and intended resource organization.

To attach a more granular policy or role-based access control to the storage bucket, you would define these policies through the OCI console or using the OCI API directly, as the Pulumi oci provider support for such features may be limited or require using additional OCI provider resources such as `oci.IdentityPolicy`.

Additionally, to enhance data security, consider enabling encryption on the bucket, which can be done either server-side by OCI or client-side before uploading the data. Server-side encryption options can also be specified when creating the bucket in Pulumi. For more refined encryption options, such as using a specific KMS key, these details would be defined in the bucket resource arguments or configured directly through the OCI platform interfaces.

Remember to refer to the [Pulumi OCI provider documentation](https://www.pulumi.com/registry/packages/oci/api-docs/objectstorage/bucket/) for more details on each attribute and how you can further customize the bucket creation and access control settings.