1. Data orchestration using AWS Data Pipeline


    The below Pulumi TypeScript program creates an AWS Data Pipeline with a basic pipeline definition that demonstrates data movement from a source S3 bucket to a destination S3 bucket. This serves as a simple example of data orchestration.

    import * as pulumi from "@pulumi/pulumi"; import * as aws from "@pulumi/aws"; const source_bucket = new aws.s3.Bucket("sourceBucket", {acl: "private"}); const destination_bucket = new aws.s3.Bucket("destinationBucket", {acl: "private"}); const pipelineRole = new aws.iam.Role("pipelineRole", { assumeRolePolicy: JSON.stringify({ Statement: [{ Action: "sts:AssumeRole", Effect: "Allow", Principal: { Service: "datapipeline.amazonaws.com", }, }], Version: "2012-10-17", }), }); new aws.iam.RolePolicy("pipelineRolePolicy", { role: pipelineRole.id, policy: { Version: "2012-10-17", Statement: [{ Effect: "Allow", Action: [ "s3:*" ], Resource: [ // Grant access to S3 source_bucket.arn, destination_bucket.arn, ] }] }, }); const my_pipeline = new aws.datapipeline.Pipeline("myPipeline", { name: "my_pipeline", description: "My first data pipeline", roleArn: pipelineRole.arn, tags: { "Name": "my_pipeline", "Purpose": "Data orchestration using AWS Data Pipeline" } }); const my_pipeline_definition = new aws.datapipeline.PipelineDefinition("myPipelineDefinition", { pipelineId: my_pipeline.id, pipelineObjects: [{ // A `S3Input` data node object for the source bucket id: "s3InputNodeId", name: "S3InputNode", fields: [{ key: "type", stringValue: "S3DataNode" }, { key: "directoryPath", stringValue: pulumi.interpolate`${source_bucket.bucket}`, }], }, { // A `S3Output` data node object for the destination bucket id: "s3OutputNodeId", name: "S3OutputNode", fields: [{ key: "type", stringValue: "S3DataNode" }, { key: "directoryPath", stringValue: pulumi.interpolate`${destination_bucket.bucket}`, }], }, { // A `CopyActivity` object id: "copyActivityId", name: "CopyActivity", fields: [{ key: "type", stringValue: "CopyActivity" },{ key: "input", refValue: "s3InputNodeId" },{ key: "output", refValue: "s3OutputNodeId" }], }], }); // Exporting the Id of the Pipeline export const pipelineId = my_pipeline.id;

    The program first creates two private buckets using the aws.s3.Bucket resource. Subsequently, it sets up the Data Pipeline by creating an IAM role and policy to allow the Data Pipeline service to access the S3 buckets.

    Next, it creates a Data Pipeline using aws.datapipeline.Pipeline, providing the previously created IAM Role's ARN and tags.

    Finally, it defines the Data Pipeline configuration using aws.datapipeline.PipelineDefinition. The pipeline definition consists of a source S3 bucket (input data node), destination S3 bucket (output data node), and a copy activity. The copy activity takes the source data node as input and the destination data node as output.

    The output of the program includes the Id of the created Data Pipeline.

    To learn more about the resources used in this program, you can visit the following links: