OCI Data Science VMs for Model Development

Question

Pulumi · Accepted Answer

When creating Data Science VMs for model development in Oracle Cloud Infrastructure (OCI) using Pulumi, you would typically use the resources provided by the OCI Pulumi provider. The key resources involved would be the `NotebookSession` for handling the data science environment, and additional components such as `Model`, `ModelDeployment`, and potentially `Project` to organize resources, `Job` to define and run computation tasks as well as `JobRun` to execute these jobs.

Let's walk through each step required to set up a basic data science environment on OCI using Pulumi and Python.

1. **Notebook Session**: This represents the data science environment where you will do your analysis and model development. It's equivalent to a VM but optimized for data science work, allowing you to run notebooks, create visualizations, run analysis, etc.

2. **Model**: This represents the machine learning model you will train and test within OCI Data Science service.

3. **Model Deployment**: Once you're happy with your model, you would use the `ModelDeployment` resource to deploy the model so that it can serve predictions.

4. **Project**: OCI Data Science service allows you to create projects to organize your work. A `Project` can act as a container for your notebook sessions, models, and jobs.

5. **Job** and **JobRun**: These resources can be used to run automated tasks, such as training machine learning models or running batch predictions.

Here is a simple Pulumi program that provisions a data science environment in OCI, including resources for a project, a notebook session, and a placeholder for a model.

```python
import pulumi
import pulumi_oci as oci

# Set up a compartment where all resources will be provisioned.
compartment_id = 'ocid1.compartment.oc1..aaaaaaa...'  # replace with your Compartment's OCID

# Create a Data Science Project to organize resources.
data_science_project = oci.datascience.Project("dataScienceProject",
    compartment_id=compartment_id,
    description="Project for Data Science Model Development",
    display_name="ModelDevProject"
)

# Create a Notebook Session within the Data Science Project for model development.
notebook_session = oci.datascience.NotebookSession("notebookSession",
    compartment_id=compartment_id,
    project_id=data_science_project.id,
    display_name="DataScienceNotebookSession",
    # Define the configuration of the notebook session.
    # Here shape is a parameter that indicates the type of VM to be used.
    notebook_session_config_details=oci.datascience.NotebookSessionNotebookSessionConfigDetailsArgs(
        shape="VM.Standard2.1",  # Select a VM shape that suits your needs. For example, a standard VM.
        # If other configurations like subnet are needed, they can be specified here.
    ),
    # Optionally, you can add more configuration like Git integration or environment variables.
)

# Output the URL to access the notebook session.
pulumi.export('notebook_session_url', notebook_session.notebook_session_url)

# Placeholder for creating and managing a Data Science Model
# Here we might display a simple Model resource creation without actual training code.
# In real usage, you would replace the `model_artifact` with your model training artifact.
model = oci.datascience.Model("model",
    compartment_id=compartment_id,
    project_id=data_science_project.id,
    display_name="DataScienceModel",
    description="Placeholder for data science model",
    model_artifact="path/to/trained-model.zip",
    artifact_content_disposition="attachment; filename=\"trained-model.zip\"",
)

# Placeholder for Model Deployment; actual deployment code would depend on model details.
# In a complete scenario, this would point to the trained model artifact and include deployment details.
model_deployment = oci.datascience.ModelDeployment("modelDeployment",
    compartment_id=compartment_id,
    project_id=data_science_project.id,
    display_name="ModelDeployment",
    model_deployment_configuration=oci.datascience.ModelDeploymentModelDeploymentConfigurationDetailsArgs(
        # Specify deployment configuration here, e.g., type of deployment, model configuration, etc.
        deployment_type="HTTP", # Example type. Choose according to your needs.
    ),
    description="Deployment of the data science model",
)

# Output the Model Deployment URL.
pulumi.export('model_deployment_url', model_deployment.model_deployment_url)
```

In this program:

- We first set up a compartment and a data science project, which help to organize and group related resources within the OCI.
- We then create a notebook session, which is effectively a VM but with data science-specific setup like a Jupyter notebook environment.
- We define placeholders for the model and model deployment. In a real scenario, you'd replace these placeholders with actual implementations that point to the trained model artifacts and include specific configuration for deployment.

With this foundation, you can adapt it to your specific use case, such as hooking it up to your data sources, installing custom libraries, and setting environment variables for model training.