How Do I Create Deep Learning VMs on GCP?

Introduction

In this tutorial, we will create a Deep Learning VM on Google Cloud Platform (GCP) AI Platform. This involves setting up necessary resources such as a Google Compute Engine instance and specifying the Deep Learning VM image type. We’ll use an AI Platform Deep Learning VM image that comes pre-installed with popular deep learning frameworks like TensorFlow and PyTorch.

Step-by-Step Guide

Follow these steps to create a Deep Learning VM on GCP:

Define the Project and Region: Start by specifying the project and region where you want to deploy your resources.
Create a VPC Network and Subnetwork:
- Use google_compute_network to define a Virtual Private Cloud (VPC) network.
- Set up a subnetwork within this VPC using google_compute_subnetwork.
Define the Google Compute Engine Instance:
- Utilize the google_compute_instance resource to create a Compute Engine instance.
- Leverage the Deep Learning VM image in the boot_disk to ensure it comes with necessary deep learning frameworks.
Set Up Firewall Rules:
- Establish firewall rules to allow SSH access to your instance, ensuring proper connectivity.

Key Points

The google_compute_network resource is used to create a VPC network.
A subnetwork is established using google_compute_subnetwork.
The Compute Engine instance is defined with a Deep Learning VM image in the boot_disk.
Firewall rules are crucial for granting access to the instance.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

// Create a VPC network
const vpcNetwork = new gcp.compute.Network("vpc_network", {
    name: "dl-vpc-network",
    autoCreateSubnetworks: false,
});
// Create a subnetwork
const subnetwork = new gcp.compute.Subnetwork("subnetwork", {
    name: "dl-subnetwork",
    network: vpcNetwork.id,
    ipCidrRange: "10.0.0.0/16",
    region: "us-central1",
});
// Create firewall rule for allowing SSH access
const firewallRule = new gcp.compute.Firewall("firewall_rule", {
    name: "allow-ssh",
    network: vpcNetwork.id,
    allows: [{
        protocol: "tcp",
        ports: ["22"],
    }],
    sourceRanges: ["0.0.0.0/0"],
});
// Define the Google Compute Engine instance
const dlVmInstance = new gcp.compute.Instance("dl_vm_instance", {
    networkInterfaces: [{
        accessConfigs: [{}],
        network: vpcNetwork.id,
        subnetwork: subnetwork.id,
    }],
    name: "dl-vm-instance",
    machineType: "n1-standard-4",
    zone: "us-central1-a",
    bootDisk: {
        initializeParams: {
            image: "projects/deeplearning-platform-release/global/images/family/tf-latest-gpu",
        },
    },
    metadata: {
        "ssh-keys": "your-ssh-keys-content",
    },
});
export const instanceName = dlVmInstance.name;
export const instanceZone = dlVmInstance.zone;
export const instancePublicIp = dlVmInstance.networkInterfaces.apply(networkInterfaces => networkInterfaces[0].accessConfigs?.[0]?.natIp);

Conclusion

In this example, we’ve successfully created a Deep Learning VM on Google Cloud Platform using Pulumi. By defining a GCP project, setting up a VPC network, creating a Compute Engine instance with a Deep Learning VM image, and configuring firewall rules, we have established a robust environment for executing deep learning workloads efficiently. This setup ensures that you have the necessary infrastructure to support intensive machine learning tasks.

Deploy this code

Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.

New to Pulumi?

Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.