Easier IaC adoption with improved `pulumi import` experience

Posted on

Last year, we introduced a new Pulumi feature that allows you to import existing infrastructure into your Pulumi program. Not only did it bring the resource into the Pulumi state file, but it could generate the source code for your Pulumi program too. Today, we’re excited to announce that we’ve listened to feedback and delivered a plethora of updates and fixes to streamline the import experience; to make it more useful, more convenient, and more powerful.

At Pulumi, we understand that many cloud engineers and platform teams around the world don’t have the luxury of greenfield projects, more often than not we’re stuck with the impossible task of “refactoring” or “migrating” existing projects to more modern stacks to help increase team productivity, velocity, and stability. These projects aren’t trivial and we want to make it easier for teams and organizations to bring their infrastructure into a cloud engineering world. Oh, and worry not you lucky greenfielders … even if you just wanna ClickOps your way through some resources and import them into your program; that’s gonna work just fine too; we won’t tell if you don’t 😉.

To help you understand the changes we’ve made to help you on your journey, let’s take a look at some side-by-side examples of how pulumi import used to work vs. what we’re releasing today in Pulumi 3.26.

Example: AWS S3 Bucket

Previously when importing an S3 bucket, we would set the default values for properties like acl and forceDestroy. The new behavior will not include properties with default values in the generated code, keeping your code inline with how you’d write it yourself. Artisanal codegen for the artisanal cloud engineer.

Code Comparison

This code would have been generated by the older implementation:

const my_bucket = new aws.s3.Bucket("my-bucket", {
    acl: "private",
    bucket: "my-bucket-3f85c54",
    forceDestroy: false,
}, {
    protect: true,
});

This code will be generated by the newer implementation:

const my_bucket = new aws.s3.Bucket("my-bucket", {
    arn: "arn:aws:s3:::my-bucket-3f85c54",
    bucket: "my-bucket-3f85c54",
    hostedZoneId: "Z3BJ6K6RIION7M",
    requestPayer: "BucketOwner",
}, {
    protect: true,
});

The above accurately reflects what the providers Read function has told us of the inputs set for this bucket, although due to the way the AWS provider works, we could elide all these properties. This is because the provider will use outputs saved in the state file and the new inputs to calculate the update diff. A property that exists in the old output and is missing from the new inputs is simply left at the old output value. As none of the properties for Bucket are marked required the following works as well and is equivalent:

const my_bucket = new aws.s3.Bucket("my-bucket", { protect: true });

Example: AWS EC2 Instances

Previously, when trying to import an EC2 instance with pulumi import, you’d be presented with a list of required properties that weren’t satisfied. Like above, the new behavior will read from the provider and inherit the missing properties, providing everything required to correctly import. This provides a cleaner and more intuitive experience for the user.

Old Behavior

$  pulumi import aws:ec2/instance:Instance test i-085d780737c600c7e
aws:ec2:Instance (test):
  error: aws:ec2/instance:Instance resource 'test' has a problem: Missing required argument: "instance_type": one of `instance_type,launch_template` must be specified. Examine values at 'Instance.InstanceType'.
  error: aws:ec2/instance:Instance resource 'test' has a problem: Missing required argument: "launch_template": one of `ami,instance_type,launch_template` must be specified. Examine values at 'Instance.LaunchTemplate'.
  error: aws:ec2/instance:Instance resource 'test' has a problem: Missing required argument: "ami": one of `ami,launch_template` must be specified. Examine values at 'Instance.Ami'.

New Behavior

The import will be successful and we’ll be presented with a rich resource in your language of choice.

const test = new aws.ec2.Instance("test", {
    ami: "ami-082b5a644766e0e6f",
    associatePublicIpAddress: true,
    availabilityZone: "us-west-2c",
    capacityReservationSpecification: {
        capacityReservationPreference: "open",
    },
    cpuCoreCount: 1,
    cpuThreadsPerCore: 1,
    creditSpecification: {
        cpuCredits: "standard",
    },
    iamInstanceProfile: "",
    instanceInitiatedShutdownBehavior: "stop",
    instanceType: "t2.micro",
    metadataOptions: {
        httpEndpoint: "enabled",
        httpPutResponseHopLimit: 1,
        httpTokens: "optional",
    },
    privateIp: "172.31.0.188",
    rootBlockDevice: {
        iops: 100,
        volumeSize: 8,
        volumeType: "gp2",
    },
    securityGroups: ["default"],
    subnetId: "subnet-43f43a1e",
    tenancy: "default",
    vpcSecurityGroupIds: ["sg-4d436f12"],
}, {
    protect: true,
});
test = aws.ec2.Instance("test",
    ami="ami-082b5a644766e0e6f",
    associate_public_ip_address=True,
    availability_zone="us-west-2c",
    capacity_reservation_specification=aws.ec2.InstanceCapacityReservationSpecificationArgs(
        capacity_reservation_preference="open",
    ),
    cpu_core_count=1,
    cpu_threads_per_core=1,
    credit_specification=aws.ec2.InstanceCreditSpecificationArgs(
        cpu_credits="standard",
    ),
    iam_instance_profile="",
    instance_initiated_shutdown_behavior="stop",
    instance_type="t2.micro",
    metadata_options=aws.ec2.InstanceMetadataOptionsArgs(
        http_endpoint="enabled",
        http_put_response_hop_limit=1,
        http_tokens="optional",
    ),
    private_ip="172.31.0.188",
    root_block_device=aws.ec2.InstanceRootBlockDeviceArgs(
        iops=100,
        volume_size=8,
        volume_type="gp2",
    ),
    security_groups=["default"],
    subnet_id="subnet-43f43a1e",
    tenancy="default",
    vpc_security_group_ids=["sg-4d436f12"],
    opts=pulumi.ResourceOptions(protect=True))
 _, err := ec2.NewInstance(ctx, "test", &ec2.InstanceArgs{
    Ami:                      pulumi.String("ami-082b5a644766e0e6f"),
    AssociatePublicIpAddress: pulumi.Bool(true),
    AvailabilityZone:         pulumi.String("us-west-2c"),
    CapacityReservationSpecification: &ec2.InstanceCapacityReservationSpecificationArgs{
            CapacityReservationPreference: pulumi.String("open"),
    },
    CpuCoreCount:      pulumi.Int(1),
    CpuThreadsPerCore: pulumi.Int(1),
    CreditSpecification: &ec2.InstanceCreditSpecificationArgs{
            CpuCredits: pulumi.String("standard"),
    },
    IamInstanceProfile:                pulumi.Any(""),
    InstanceInitiatedShutdownBehavior: pulumi.String("stop"),
    InstanceType:                      pulumi.String("t2.micro"),
    MetadataOptions: &ec2.InstanceMetadataOptionsArgs{
            HttpEndpoint:            pulumi.String("enabled"),
            HttpPutResponseHopLimit: pulumi.Int(1),
            HttpTokens:              pulumi.String("optional"),
    },
    PrivateIp: pulumi.String("172.31.0.188"),
    RootBlockDevice: &ec2.InstanceRootBlockDeviceArgs{
            Iops:       pulumi.Int(100),
            VolumeSize: pulumi.Int(8),
            VolumeType: pulumi.String("gp2"),
    },
    SecurityGroups: pulumi.StringArray{
            pulumi.String("default"),
    },
    SubnetId: pulumi.String("subnet-43f43a1e"),
    Tenancy: pulumi.String("default"),
    VpcSecurityGroupIds: pulumi.StringArray{
            pulumi.String("sg-4d436f12"),
    },
}, pulumi.Protect(true))
var test = new Aws.Ec2.Instance("test", new Aws.Ec2.InstanceArgs
    {
        Ami = "ami-082b5a644766e0e6f",
        AssociatePublicIpAddress = true,
        AvailabilityZone = "us-west-2c",
        CapacityReservationSpecification = new Aws.Ec2.Inputs.InstanceCapacityReservationSpecificationArgs
        {
            CapacityReservationPreference = "open",
        },
        CpuCoreCount = 1,
        CpuThreadsPerCore = 1,
        CreditSpecification = new Aws.Ec2.Inputs.InstanceCreditSpecificationArgs
        {
            CpuCredits = "standard",
        },
        IamInstanceProfile = "",
        InstanceInitiatedShutdownBehavior = "stop",
        InstanceType = "t2.micro",
        MetadataOptions = new Aws.Ec2.Inputs.InstanceMetadataOptionsArgs
        {
            HttpEndpoint = "enabled",
            HttpPutResponseHopLimit = 1,
            HttpTokens = "optional",
        },
        PrivateIp = "172.31.0.188",
        RootBlockDevice = new Aws.Ec2.Inputs.InstanceRootBlockDeviceArgs
        {
            Iops = 100,
            VolumeSize = 8,
            VolumeType = "gp2",
        },
        SecurityGroups =
        {
            "default",
        },
        SubnetId = "subnet-43f43a1e",
        Tenancy = "default",
        VpcSecurityGroupIds =
        {
            "sg-4d436f12",
        },
    }, new CustomResourceOptions
    {
        Protect = true,
    });
    ```

Again this reflects what Read has told us of the inputs set for this instance, and again due to the way the AWS provider works many of these inputs could be elided and the provider would pick up the values from the saved output set. The following is again equivalent to the above given the saved state:

const test = new aws.ec2.Instance("test", {
    ami: "ami-082b5a644766e0e6f",
    instanceType: "t2.micro"
}, {
    protect: true,
});

Everything is Better, Yes?

To quote our old friend, Harlan, “Everything is better, yes?”

Comtrya

While we’ve only shown you a couple of examples of this new behaviour, rest assured that the benefits of it stretch far and wide across the majority of Pulumi resources.

If you’ve ever had a problem importing a resource before, we encourage you to try it again and let us know if you run into any problems.

Curious how this all works? Let’s dive in.

Technical details

We’ve never really explained how the import system works before, which has lead to a lot of confusion from users when they encountered errors with it. Given the changes we’re making here we felt it a good time to give a more detailed write up of how the engine works to try and help users understand why their import results are the way they are.

This section requires some understanding of the Pulumi architecture, but that understanding is not necessary to be able to use pulumi import.

The importance of Provider.Read

Our import system depends on the ability for a resource provider to be able to read the existing state of a resource and report back to Pulumi the current value of it’s inputs and outputs.

This isn’t always possible to return accurately. For example if a resource’s inputs don’t match up 1-to-1 with it’s output state and the underlying provider can only read the current output state there’s no way for it to always construct the correct input state. There could also be bugs in the provider’s Read method that result in inaccurate reads. We’ve designed the import system to be tolerant to these cases, but it does mean we don’t expect perfect import results for every resource.

The current import system

To import resources with the currently the engine issues the following steps:

  1. Use the providers Read function to get the inputs and outputs for the resource.
  2. Remove all properties from the inputs except for those flagged as required.
  3. Pass that reduced input property set to the providers Check method.
  4. If there are any check failures return an error and fail the import.
  5. Take the result of Check which will have filled in default properties and diff it with the original input set returned from Read.
  6. If there are any diffs copy in the values from the original Read to the resulting input set.
  7. Save the updated input set and the original output set into the stacks state.
  8. Pass the resulting input set to the code generator to generate a new Resource call to match the imported state.

There are a number of problems with the above. The biggest is trying to strip the input set down to only properties that are marked as required. The required flag is a fairly blunt tool that can’t capture that some fields are only required sometimes based on the value of other fields. As such this stripping to just required properties causes many of the following calls to Check to raise check failures.

The next major issue is that we hard fail if check returns failures. There is always the risk of this happening due to limitations of Read, and it doesn’t help the user just throwing an error.

Finally in the cases where the above two issues weren’t hit this data flow had the odd result of setting optional fields to their default values. This isn’t needed and these fields would be better left as blank.

The new import system aims to improve the described issues.

The new import system

The engine will now issue the following steps to import resources:

  1. Use the providers Read function to get the inputs and outputs for the resource.
  2. Pass the input set to the providers Check method.
  3. Warn if there are any check failures.
  4. Save the input and output set from Read into the stacks state.
  5. Pass the input set to the code generator to generate a new Resource call to match the imported state.

Note that we no longer end up with filled in default values in our final result, and we carry on in the face of check failures and so can still generate some code. If providers can give complete and accurate results from Read the above flow will result in correctly generated code. However it doesn’t result in quite the correct stack state, this is due to how the data flow of our import system currently works. Currently we generate code based on the value of inputs in the state file, however normally the value stored at inputs in the state file is the result from Check, not what is directly in the users program. As such when you run the first pulumi up after an import your state file will change slightly as the engine calls Check and saves the inputs with defaults filled in. This should normally be transparent to the user, but there may be some cases where providers think this is a trivial diff.

There are two complications with fixing the above. The first is that during import if Read doesn’t return valid results then Check can fail and so we won’t have a property set returned from it that we can save to the state file. This case will always have to fall back to just storing the properties returned by Read and triggering a check failure during up that will need to be manually resolved. The second complication is that the code generation for import works off the state file rather than a separate data flow specifically for imported resources. We plan on improving this but in the spirit of agility we felt that a slightly better import feature today would be more useful to our users than a perfect import feature later.

Terraform Read

Many of Pulumi’s resource providers make use of Terraform providers. We call these bridged providers and all of them use our “terraform-bridge” library to translate between the Terraform API and Pulumi API.

Terraform currently has a slightly odd behavior that missing values for some properties are zero initialized by its Refresh method (what we use to implement Pulumi’s Read method in the terraform bridge). This has the result that when Pulumi asks to read a given Terraform resource some top level properties (such as S3 buckets acl property) come back as missing, while other top level properties and all nested properties (such as S3 buckets versioning.mfaDelete) come back with some zero value.

For example reading an AWS S3 bucket object with the current Terraform bridge results in the following property set:

{
    "__defaults": [],
    "accelerationStatus": "",
    "arn": "arn:aws:s3:::my-bucket-3f85c54",
    "bucket": "my-bucket-3f85c54",
    "corsRules": [],
    "grants": [],
    "hostedZoneId": "Z3BJ6K6RIION7M",
    "lifecycleRules": [],
    "loggings": [],
    "objectLockConfiguration": null,
    "replicationConfiguration": null,
    "requestPayer": "BucketOwner",
    "serverSideEncryptionConfiguration": null,
    "tags": {
        "__defaults": []
    },
    "versioning": {
        "__defaults": [],
        "enabled": false,
        "mfaDelete": false
    },
    "website": null
}

We’ve made a change to the Terraform bridge to try and elide these zero properties that should just be missing. Reading the same S3 bucket with these changes gives the following much smaller property set:

{
    "__defaults": [],
    "arn": "arn:aws:s3:::my-bucket-3f85c54",
    "bucket": "my-bucket-3f85c54",
    "hostedZoneId": "Z3BJ6K6RIION7M",
    "requestPayer": "BucketOwner",
    "website": null
}

We’ll continue to work on the Terraform bridge to give better results here. For example the website property on the bucket should probably be missing rather than null.

Feedback

We’re aware this could have a large impact to some of our users so we wanted to make sure we explained these changes well. If you have questions or concerns about this change let us know in our community Slack or in GitHub Discussions.