How do I build an AWS glue catalogtable with Pulumi?
To build an AWS Glue Catalog Table, we need to define a few main components:
- AWS Glue Catalog Database: This is where the table will reside.
- AWS Glue Catalog Table: The actual table definition including its schema, storage descriptor, and other configuration settings.
These components allow us to define a metadata repository of your dataset that Glue can manage. Below, you will find the necessary steps and code to create an AWS Glue Catalog Table. You can replace any placeholder values (like <region>
, <database_name>
, etc.) with actual values based on your requirements.
AWS Glue Catalog Table Components
- Provider: Ensures we are using AWS and specifies the region.
- Database: The Glue Database where your table will reside.
- Table: The actual Glue Table including schema and storage details.
Here’s how you can do it:
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
// Create Glue Catalog Database
const example = new aws.glue.CatalogDatabase("example", {name: "<database_name>"});
// Create Glue Catalog Table
const exampleCatalogTable = new aws.glue.CatalogTable("example", {
name: "<table_name>",
databaseName: example.name,
description: "An example Glue Table",
storageDescriptor: {
columns: [
{
name: "id",
type: "int",
},
{
name: "name",
type: "string",
},
],
location: "s3://<bucket_name>/data/",
inputFormat: "org.apache.hadoop.mapred.TextInputFormat",
outputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
compressed: false,
serDeInfo: {
name: "example_ser_de",
serializationLibrary: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
parameters: {
"serialization.format": "1",
},
},
parameters: {
EXTERNAL: "TRUE",
},
},
partitionKeys: [{
name: "partition_key",
type: "string",
}],
parameters: {
classification: "json",
},
tableType: "EXTERNAL_TABLE",
});
export const databaseName = example.name;
export const tableName = exampleCatalogTable.name;
Code Explanation
provider "aws"
: Configures AWS provider to use the specified region.aws_glue_catalog_database "example"
: Creates a Glue database.aws_glue_catalog_table "example"
: Defines the Glue table with column schema, storage descriptor, SerDe (serialization and deserialization) info, and partitioning details.- Outputs: Provides outputs for the created database and table names.
Summary
In this configuration, we set up an AWS Glue Catalog Database and a Table within it. We defined the schema, storage descriptor (location on S3), and other necessary configurations for the Glue Table. This setup allows AWS Glue to manage and query data from the defined S3 location.
Deploy this code
Want to deploy this code? Sign up for a free Pulumi account to deploy in a few clicks.
Sign upNew to Pulumi?
Want to deploy this code? Sign up with Pulumi to deploy in a few clicks.
Sign upThank you for your feedback!
If you have a question about how to use Pulumi, reach out in Community Slack.
Open an issue on GitHub to report a problem or suggest an improvement.