Building secure data lakes with AWS QuickSight
PythonGiven your request, let's use AWS S3 Bucket to store our data lake first, a glue catalog database and table for managing your AWS Glue data catalog, and then we will use QuickSight service to analyze your data.
Note: Automated creation and integration of QuickSight resources with Pulumi is not yet supported. You will have to manually create a QuickSight analysis or dashboard that uses your AWS Glue data catalog.
The program will be as follows:
import pulumi from pulumi_aws import s3, glue # Create a data lake using an Amazon S3 bucket datalake_bucket = s3.Bucket("datalake") # Create a Glue Catalog Database where metadata will be stored glue_database = glue.CatalogDatabase("my_database", name="MyDatabase" ) # Build glue catalog table to allow QuickSight access to our data glue_table = glue.CatalogTable("my_table", name="MyTable", database_name=glue_database.name, storage_descriptor={ 'location': datalake_bucket.arn, 'inputFormat': "org.apache.hadoop.mapred.TextInputFormat", 'outputFormat': "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", 'storedAsSubDirectories': False, }, ) # Export the name of the bucket, Glue Database and Glue Table pulumi.export('bucket_name', datalake_bucket.bucket) pulumi.export('glue_database_name', glue_database.name) pulumi.export('glue_table_name', glue_table.name)
This program will:
- Create an Amazon S3 bucket to store your data lake.
- Create a Glue Catalog Database.
- Create a Glue Catalog Table that specifies the format and location of the data.
After running this program:
- Upload your data into the S3 bucket created.
- Log into the AWS Management Console and navigate to the QuickSight console.
- Manually create a new analysis or dashboard, and when selecting the data source, choose AWS Glue. You should find the Glue database and table that were created by this program. QuickSight will use these to analyze your data.
Relevant Resources: