DigitalOcean Droplets as Training Environments for ML Models

Question

Pulumi · Accepted Answer

To set up DigitalOcean Droplets to serve as training environments for machine learning (ML) models, we will need the following components:

1. **DigitalOcean Droplet**: This will be the virtual server (VM) that runs our processing tasks. We can customize its size based on the computing requirements of our ML models.
2. **SSH Key**: For secure access to the Droplet, we should add an SSH key.
3. **Firewall**: To control the traffic to and from our Droplet, setting up a firewall is essential.
4. **Custom Image (optional)**: If you have a specific environment setup that you want to replicate across droplets, a custom image can be used to create droplets with this environment pre-configured.

Here's how you can use Pulumi to create such an environment.

First, let's define our Droplet. We need to specify the size, region, image, and other properties that best match our requirements. For ML workloads, choosing a size that offers an adequate amount of CPU and memory is crucial.

For the image, if you're using common ML frameworks like TensorFlow or PyTorch, you might opt for an image that comes with these pre-installed, or you can set up the environment yourself using the `userData` parameter where you can pass a bash script to install necessary software on first boot.

SSH keys are crucial for secure remote access. We'll create an SSH key resource and associate it with our Droplet.

Firewalls help secure our Droplet by allowing only certain types of traffic. We will define a basic firewall rule that allows SSH (port 22) traffic.

Now let's write the Pulumi program:

```python
import pulumi
import pulumi_digitalocean as digitalocean

# Replace the following placeholders with your own information
droplet_name = "ml-droplet"
region = "nyc3" # Choose a region close to you or your customers
size = "s-1vcpu-1gb" # Example size, choose one that matches your ML workload
image = "ubuntu-20-04-x64" # Example image, you can choose other images or custom snapshots
ssh_key_public_path = "~/.ssh/id_rsa.pub" # Your SSH public key path
firewall_name = "ml-firewall"

# Read the SSH public key from the specified file
with open(ssh_key_public_path, 'r') as key_file:
    ssh_key_public = key_file.read()

# Create an SSH key resource to associate with the Droplet
ssh_key = digitalocean.SshKey("ml-ssh-key",
    public_key=ssh_key_public)

# Define a Droplet for machine learning workloads
ml_droplet = digitalocean.Droplet(droplet_name,
    image=image,
    size=size,
    region=region,
    ssh_keys=[ssh_key.id],
    tags=["ml"],
    # 'user_data' can be used to run a script on the first boot. 
    # This can be useful to install ML frameworks and dependencies.
    # e.g., Uncomment and edit the following line to install Miniconda:
    # user_data="""#!/bin/bash
    #    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    #    bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda
    #"""
)

# Create a Firewall that allows SSH traffic to the Droplet
ml_firewall = digitalocean.Firewall(firewall_name,
    droplet_ids=[ml_droplet.id],
    inbound_rules=[{
        "protocol": "tcp",
        "port_range": "22",
        "source_addresses": ["0.0.0.0/0", "::/0"], # Allows SSH from any IP - Be cautious with this setting
    }],
    outbound_rules=[{
        "protocol": "tcp",
        "port_range": "80",
        "destination_addresses": ["0.0.0.0/0", "::/0"], # Allows HTTP traffic to any IP
    }, {
        "protocol": "tcp",
        "port_range": "443",
        "destination_addresses": ["0.0.0.0/0", "::/0"], # Allows HTTPS traffic to any IP
    }]
)

# Export the IP address of the Droplet
pulumi.export('droplet_ip', ml_droplet.ipv4_address)
```

Before running the above program:

- Ensure you've installed Pulumi and set it up for Python.
- Configure Pulumi to manage resources in DigitalOcean. This typically involves setting the `DIGITALOCEAN_TOKEN` environment variable to your DigitalOcean personal access token.

The program starts by reading your SSH public key and creating a DigitalOcean SSH key resource. The key will then be associated with the Droplet.

Next, we create the Droplet itself with the given size and image. The `tags` parameter helps us manage and identify this droplet as part of our ML infrastructure.

The `user_data` script is commented out in the example but can be used to automatically set up the environment when the Droplet is provisioned.

Finally, we set up a Firewall to secure our Droplet. We want to ensure only SSH access is allowed for remote connections. We also set export to provide us with the Droplet's IP address, which we will use to access our new machine learning environment.

After running the Pulumi program with `pulumi up`, you will receive an IP address. You can then SSH into your new Droplet and begin setting up your machine learning environment or start training your models directly.