1. Answers
  2. Building an AWS Glue CatalogTable with Pulumi

How do I build an AWS Glue CatalogTable with Pulumi?

In this guide, we will create an AWS Glue CatalogTable using Pulumi in TypeScript. AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics. The CatalogTable resource represents a metadata definition for your data in the AWS Glue Data Catalog.

Key Points

  • We will define an AWS Glue CatalogDatabase which will contain our CatalogTable.
  • The CatalogTable will be defined with necessary properties like name, database name, storage descriptor, and schema.
  • The storage descriptor will include details about data storage such as columns, location, input format, and output format.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

// Create an AWS Glue Catalog Database
const glueDatabase = new aws.glue.CatalogDatabase("myDatabase", {
    name: "my_database",
});

// Define the storage descriptor for the Glue Catalog Table
const storageDescriptor = {
    columns: [
        { name: "id", type: "int", comment: "Identifier" },
        { name: "name", type: "string", comment: "Name of the entity" },
        { name: "age", type: "int", comment: "Age of the entity" },
    ],
    location: "s3://my-bucket/data/",
    inputFormat: "org.apache.hadoop.mapred.TextInputFormat",
    outputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
    serDeInfo: {
        serializationLibrary: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
        parameters: {
            "field.delim": ",",
        },
    },
};

// Create an AWS Glue Catalog Table
const glueCatalogTable = new aws.glue.CatalogTable("myCatalogTable", {
    name: "my_table",
    databaseName: glueDatabase.name,
    storageDescriptor: storageDescriptor,
    tableType: "EXTERNAL_TABLE",
    parameters: {
        "classification": "csv",
    },
    partitionKeys: [
        { name: "year", type: "int" },
        { name: "month", type: "int" },
    ],
});

Summary

In this guide, we created an AWS Glue CatalogTable using Pulumi in TypeScript. We started by defining a Glue CatalogDatabase and then set up a storage descriptor for our table. Finally, we created the CatalogTable with necessary properties and linked it to our database. This setup allows you to manage your data schema and storage details efficiently within the AWS Glue Data Catalog.

Deploy this code

Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.

Sign up

New to Pulumi?

Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.

Sign up