OCI Core Instances as Inference Servers for LLMs

Question

Pulumi · Accepted Answer

To deploy OCI Core Instances as inference servers for Large Language Models (LLMs), you would need to set up a compute instance in Oracle Cloud Infrastructure (OCI) that is properly configured for this task. This typically involves choosing the right compute shape (which defines the number of CPUs, amount of memory, and other resources), ensuring that the instance is equipped with any needed GPU capabilities for inference, and configuring the necessary network settings to allow access to the inference service.

The Pulumi code below demonstrates how to create such an OCI Core Instance using the Pulumi OCI provider. In this example, we'll set up an instance with a specific shape suitable for inference tasks, associate it with a subnet within a Virtual Cloud Network (VCN), and provide user data to configure the instance on launch.

Please review the following program, which is a complete Pulumi script that you can run to deploy an OCI Core Instance configured for LLM inference. Make sure to replace the placeholders with actual values that reflect your specific OCI configuration, such as compartment IDs, subnet IDs, and the image ID for your instance.

```python
import pulumi
import pulumi_oci as oci

# Create an instance representing our inference server for LLMs.
inference_instance = oci.core.Instance("inferenceInstance",
    # The compartment where the instance will be created.
    compartment_id="ocid1.compartment.oc1..exampleuniqueID",
    
    # Specify the availability domain in which to create the instance.
    availability_domain="example-availability-domain",
    
    # The shape of the instance - choose an appropriate shape that can handle inference workload.
    shape="VM.Standard2.1",
    
    # The OCID of the image or a shape-specific image capable of running the inference server software,
    # such as a TensorFlow or PyTorch environment with necessary drivers for GPUs (if needed).
    source_details=oci.core.InstanceSourceDetailsArgs(
        source_type="image",
        image_id="ocid1.image.oc1..exampleuniqueID",
    ),
    
    # The existing subnet's OCID where the instance will reside.
    create_vnic_details=oci.core.InstanceCreateVnicDetailsArgs(
        subnet_id="ocid1.subnet.oc1..exampleuniqueID",
        # Specify whether the VNIC should be assigned a public IP address.
        assign_public_ip=True,
    ),
    
    # The metadata key/value pairs, including user data to configure the instance.
    metadata={
        "ssh_authorized_keys": "ssh-rsa EXAMPLE",
        # User data for instance initialization (cloud-init, shell script, etc.).
        "user_data": pulumi.Output.secret("base64_encoded_cloud_init_data"),
    }
)

# Export the instance's public and private IP addresses
pulumi.export('inference_instance_public_ip', inference_instance.public_ip)
pulumi.export('inference_instance_private_ip', inference_instance.private_ip)
```

Explanation of the components:

- **oci.core.Instance:** The main resource used to create a virtual machine within Oracle Cloud Infrastructure. This will be your inference server running LLMs.
  
- **compartment_id:** A unique identifier to specify the compartment where the resource will be created. In OCI terms, compartments are a collection of related resources.

- **availability_domain:** Specifies the data center where your instance will be created for high availability and reducing latency.

- **shape:** Refers to the number of CPUs and amount of memory allocated to the instance. You should choose a shape that is appropriate for running Large Language Models.

- **source_details:** Here we visualize specifying an OS image or an already existing boot volume to be attached to the instance.

- **create_vnic_details:** This defines the Virtual Network Interface Card (VNIC) details, such as the subnet in which the instance will be attached. Each instance has at least one VNIC.

- **metadata:** Metadata provides a way to supply a cloud init script or SSH keys which are commonly required to bootstrap the instance.

Once the Pulumi program is written, to create the infrastructure coded above, follow these steps:

1. Ensure you have Pulumi CLI and OCI CLI installed and configured.
2. Save the Pulumi code in a file named `__main__.py`.
3. Initialize a new Pulumi project in the directory where the code is saved.
4. Run `pulumi up` from the command line to deploy the OCI instance.

It's important to note that to manage OCI resources with Pulumi, you must have the appropriate credentials configured in your environment. Additionally, the above script assumes that you are familiar with cloud-init and base64 encoding for passing user data to the instance. The user data can include initial setup scripts that will install and configure the inference server software on the OCI instance.