At Pulumi, we understand that Pulumi Cloud plays an important role in how our customers address their infrastructure management challenges. As a result, we strive for the highest levels of availability and performance in Pulumi Cloud. Unfortunately, on Friday, October 6, 2023, Pulumi Cloud suffered a 24 minute outage during which we failed to process 74.7% of received requests. In this post, we’d like to share our findings on the root cause of this outage, and the steps we are taking to ensure this sort of outage doesn’t happen again.
Each Pulumi Stack you deploy manages a key set of cloud infrastructure for your organization. The Pulumi Console includes a variety of features for exposing key information about your stack for other users within your organization - configuration, outputs, resources under management, links to cloud providers, and a graph of all resources. However, it’s often useful to allow the author of a Pulumi Stack to describe in their own words the key elements of a stack, so future viewers can quickly understand the components and cloud resources that are managed.