Data orchestration using AWS Data Pipeline
TypeScriptThe below Pulumi TypeScript program creates an AWS Data Pipeline with a basic pipeline definition that demonstrates data movement from a source S3 bucket to a destination S3 bucket. This serves as a simple example of data orchestration.
import * as pulumi from "@pulumi/pulumi"; import * as aws from "@pulumi/aws"; const source_bucket = new aws.s3.Bucket("sourceBucket", {acl: "private"}); const destination_bucket = new aws.s3.Bucket("destinationBucket", {acl: "private"}); const pipelineRole = new aws.iam.Role("pipelineRole", { assumeRolePolicy: JSON.stringify({ Statement: [{ Action: "sts:AssumeRole", Effect: "Allow", Principal: { Service: "datapipeline.amazonaws.com", }, }], Version: "2012-10-17", }), }); new aws.iam.RolePolicy("pipelineRolePolicy", { role: pipelineRole.id, policy: { Version: "2012-10-17", Statement: [{ Effect: "Allow", Action: [ "s3:*" ], Resource: [ // Grant access to S3 source_bucket.arn, destination_bucket.arn, ] }] }, }); const my_pipeline = new aws.datapipeline.Pipeline("myPipeline", { name: "my_pipeline", description: "My first data pipeline", roleArn: pipelineRole.arn, tags: { "Name": "my_pipeline", "Purpose": "Data orchestration using AWS Data Pipeline" } }); const my_pipeline_definition = new aws.datapipeline.PipelineDefinition("myPipelineDefinition", { pipelineId: my_pipeline.id, pipelineObjects: [{ // A `S3Input` data node object for the source bucket id: "s3InputNodeId", name: "S3InputNode", fields: [{ key: "type", stringValue: "S3DataNode" }, { key: "directoryPath", stringValue: pulumi.interpolate`${source_bucket.bucket}`, }], }, { // A `S3Output` data node object for the destination bucket id: "s3OutputNodeId", name: "S3OutputNode", fields: [{ key: "type", stringValue: "S3DataNode" }, { key: "directoryPath", stringValue: pulumi.interpolate`${destination_bucket.bucket}`, }], }, { // A `CopyActivity` object id: "copyActivityId", name: "CopyActivity", fields: [{ key: "type", stringValue: "CopyActivity" },{ key: "input", refValue: "s3InputNodeId" },{ key: "output", refValue: "s3OutputNodeId" }], }], }); // Exporting the Id of the Pipeline export const pipelineId = my_pipeline.id;
The program first creates two private buckets using the
aws.s3.Bucket
resource. Subsequently, it sets up the Data Pipeline by creating an IAM role and policy to allow the Data Pipeline service to access the S3 buckets.Next, it creates a Data Pipeline using
aws.datapipeline.Pipeline
, providing the previously created IAM Role's ARN and tags.Finally, it defines the Data Pipeline configuration using
aws.datapipeline.PipelineDefinition
. The pipeline definition consists of a source S3 bucket (input data node), destination S3 bucket (output data node), and a copy activity. The copy activity takes the source data node as input and the destination data node as output.The output of the program includes the Id of the created Data Pipeline.
To learn more about the resources used in this program, you can visit the following links: