Fully Automated AI Inference on AWS, Azure, and Google Cloud with Pulumi

Putting Ollama on a cloud GPU is something I keep coming back to. A while ago I wrote up running open-source LLMs on an AWS EC2 box with Ollama and Pulumi, and the shape never really changes: a GPU instance, a model server, and a firewall rule in front. Infrastructure as code earned its place by making that kind of setup predictable and repeatable, and AI infrastructure is no exception. A GPU box serving a model is still a VM, a disk, and a firewall rule, and it should be declared like one.
Thorsten Hans made exactly that case in his Akamai post, Fully Automated AI Infrastructures with Terraform and Akamai Cloud, which stands up a single GPU instance on Linode, installs the drivers, runs Ollama, and pulls a model, with no manual steps after terraform apply.
I liked the shape of it, so this post ports the same idea to Pulumi and runs it across AWS, Azure, and Google Cloud instead of one. The result is one program shape per cloud: a single pulumi up brings up a GPU box that installs its own driver, runs Ollama, and pulls a model with no manual steps, and a single pulumi destroy takes it back down. Along the way it drops the two imperative bits the Terraform version leans on: a static access token sitting in an environment variable, and a null_resource running a shell loop to wait for the model. The first becomes an OIDC login from a Pulumi ESC environment, so no long-lived key lives anywhere. The second turns out not to be a resource at all.