Create AWS Athena Data Catalogs

The aws:athena/dataCatalog:DataCatalog resource, part of the Pulumi AWS provider, registers external data catalogs with Athena, enabling queries against Glue, Hive, or custom Lambda-based metadata sources. This guide focuses on three capabilities: Glue Data Catalog integration, external Hive metastore federation, and custom Lambda-based connectors.

Data catalogs reference existing Lambda functions, Glue catalogs, or Hive metastores rather than creating them. The examples are intentionally small. Combine them with your own Lambda functions, IAM roles, and connector implementations.

Connect to AWS Glue Data Catalog

Teams using AWS Glue for ETL often need Athena to query tables managed by Glue’s centralized metadata store.

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const example = new aws.athena.DataCatalog("example", {
    name: "glue-data-catalog",
    description: "Glue based Data Catalog",
    type: "GLUE",
    parameters: {
        "catalog-id": "123456789012",
    },
});
import pulumi
import pulumi_aws as aws

example = aws.athena.DataCatalog("example",
    name="glue-data-catalog",
    description="Glue based Data Catalog",
    type="GLUE",
    parameters={
        "catalog-id": "123456789012",
    })
package main

import (
	"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/athena"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := athena.NewDataCatalog(ctx, "example", &athena.DataCatalogArgs{
			Name:        pulumi.String("glue-data-catalog"),
			Description: pulumi.String("Glue based Data Catalog"),
			Type:        pulumi.String("GLUE"),
			Parameters: pulumi.StringMap{
				"catalog-id": pulumi.String("123456789012"),
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;

return await Deployment.RunAsync(() => 
{
    var example = new Aws.Athena.DataCatalog("example", new()
    {
        Name = "glue-data-catalog",
        Description = "Glue based Data Catalog",
        Type = "GLUE",
        Parameters = 
        {
            { "catalog-id", "123456789012" },
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.athena.DataCatalog;
import com.pulumi.aws.athena.DataCatalogArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var example = new DataCatalog("example", DataCatalogArgs.builder()
            .name("glue-data-catalog")
            .description("Glue based Data Catalog")
            .type("GLUE")
            .parameters(Map.of("catalog-id", "123456789012"))
            .build());

    }
}
resources:
  example:
    type: aws:athena:DataCatalog
    properties:
      name: glue-data-catalog
      description: Glue based Data Catalog
      type: GLUE
      parameters:
        catalog-id: '123456789012'

The type property set to “GLUE” tells Athena to use AWS Glue’s metadata service. The parameters object requires a catalog-id key containing your AWS account ID. Once registered, Athena queries can reference tables and databases from the Glue catalog.

Connect to external Hive metastore

Organizations running on-premises Hadoop or self-managed Hive can federate their existing metastore into Athena without migrating metadata.

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const example = new aws.athena.DataCatalog("example", {
    name: "hive-data-catalog",
    description: "Hive based Data Catalog",
    type: "HIVE",
    parameters: {
        "metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function",
    },
});
import pulumi
import pulumi_aws as aws

example = aws.athena.DataCatalog("example",
    name="hive-data-catalog",
    description="Hive based Data Catalog",
    type="HIVE",
    parameters={
        "metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function",
    })
package main

import (
	"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/athena"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := athena.NewDataCatalog(ctx, "example", &athena.DataCatalogArgs{
			Name:        pulumi.String("hive-data-catalog"),
			Description: pulumi.String("Hive based Data Catalog"),
			Type:        pulumi.String("HIVE"),
			Parameters: pulumi.StringMap{
				"metadata-function": pulumi.String("arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function"),
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;

return await Deployment.RunAsync(() => 
{
    var example = new Aws.Athena.DataCatalog("example", new()
    {
        Name = "hive-data-catalog",
        Description = "Hive based Data Catalog",
        Type = "HIVE",
        Parameters = 
        {
            { "metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function" },
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.athena.DataCatalog;
import com.pulumi.aws.athena.DataCatalogArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var example = new DataCatalog("example", DataCatalogArgs.builder()
            .name("hive-data-catalog")
            .description("Hive based Data Catalog")
            .type("HIVE")
            .parameters(Map.of("metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function"))
            .build());

    }
}
resources:
  example:
    type: aws:athena:DataCatalog
    properties:
      name: hive-data-catalog
      description: Hive based Data Catalog
      type: HIVE
      parameters:
        metadata-function: arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function

The type property set to “HIVE” enables Hive metastore federation. The parameters object requires a metadata-function key pointing to a Lambda function that implements the Hive connector protocol. This Lambda function translates Athena’s metadata requests into calls against your Hive metastore.

Build custom federated catalog with Lambda

Some data sources require custom query federation logic that isn’t covered by built-in connectors.

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const example = new aws.athena.DataCatalog("example", {
    name: "lambda-data-catalog",
    description: "Lambda based Data Catalog",
    type: "LAMBDA",
    parameters: {
        "metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1",
        "record-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2",
    },
});
import pulumi
import pulumi_aws as aws

example = aws.athena.DataCatalog("example",
    name="lambda-data-catalog",
    description="Lambda based Data Catalog",
    type="LAMBDA",
    parameters={
        "metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1",
        "record-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2",
    })
package main

import (
	"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/athena"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := athena.NewDataCatalog(ctx, "example", &athena.DataCatalogArgs{
			Name:        pulumi.String("lambda-data-catalog"),
			Description: pulumi.String("Lambda based Data Catalog"),
			Type:        pulumi.String("LAMBDA"),
			Parameters: pulumi.StringMap{
				"metadata-function": pulumi.String("arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1"),
				"record-function":   pulumi.String("arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2"),
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;

return await Deployment.RunAsync(() => 
{
    var example = new Aws.Athena.DataCatalog("example", new()
    {
        Name = "lambda-data-catalog",
        Description = "Lambda based Data Catalog",
        Type = "LAMBDA",
        Parameters = 
        {
            { "metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1" },
            { "record-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2" },
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.athena.DataCatalog;
import com.pulumi.aws.athena.DataCatalogArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var example = new DataCatalog("example", DataCatalogArgs.builder()
            .name("lambda-data-catalog")
            .description("Lambda based Data Catalog")
            .type("LAMBDA")
            .parameters(Map.ofEntries(
                Map.entry("metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1"),
                Map.entry("record-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2")
            ))
            .build());

    }
}
resources:
  example:
    type: aws:athena:DataCatalog
    properties:
      name: lambda-data-catalog
      description: Lambda based Data Catalog
      type: LAMBDA
      parameters:
        metadata-function: arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1
        record-function: arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2

The type property set to “LAMBDA” enables fully custom federation. The parameters object requires two Lambda functions: metadata-function handles schema discovery and table listings, while record-function retrieves actual data rows. Both functions must implement Athena’s federation protocol to translate queries into source-specific operations.

Beyond these examples

These snippets focus on specific data catalog features: Glue, Hive, and Lambda catalog types, and catalog-specific parameter mapping. They’re intentionally minimal rather than full federation deployments.

The examples reference pre-existing infrastructure such as Lambda functions implementing connector logic, and AWS Glue Data Catalog or external Hive metastore. They focus on catalog registration rather than building the underlying connectors.

To keep things focused, common catalog patterns are omitted, including:

  • Resource tagging (tags property)
  • Cross-region catalog access
  • IAM permissions for Lambda invocation
  • Connector deployment and testing

These omissions are intentional: the goal is to illustrate how each catalog type is wired, not provide drop-in federation modules. See the Athena DataCatalog resource reference for all available configuration options.

Let's create AWS Athena Data Catalogs

Get started with Pulumi Cloud, then follow our quick setup guide to deploy this infrastructure.

Try Pulumi Cloud for FREE

Frequently Asked Questions

Catalog Types & Configuration
What types of data catalogs can I create in Athena?
You can create three types: LAMBDA for federated catalogs, GLUE for AWS Glue Catalog integration, or HIVE for external Hive metastores.
What parameters are required for each catalog type?

Parameters vary by type:

  • LAMBDA: Either function (single Lambda ARN), or both metadata-function and record-function (two Lambda ARNs)
  • GLUE: catalog-id (your AWS account ID)
  • HIVE: metadata-function (Lambda ARN for metadata operations)
Naming & Constraints
What are the naming requirements for data catalogs?
Catalog names must be unique within your AWS account and can use up to 128 alphanumeric characters, underscores, at signs (@), or hyphens.
Can I rename my data catalog after creation?
No, the name property is immutable. Renaming requires deleting and recreating the resource.
Which properties can't be changed after creation?
Only the name property is immutable. You can modify description, parameters, and tags after creation.

Using a different cloud?

Explore analytics guides for other cloud providers: