The aws:athena/dataCatalog:DataCatalog resource, part of the Pulumi AWS provider, registers external data catalogs with Athena, enabling queries against Glue, Hive, or custom Lambda-backed metadata sources. This guide focuses on three capabilities: Glue catalog integration, external Hive metastore connectivity, and Lambda-based federated catalogs.
Data catalogs reference Lambda functions or existing Glue/Hive catalogs that must exist separately. The examples are intentionally small. Combine them with your own Lambda functions, IAM roles, and metadata sources.
Connect to AWS Glue Data Catalog
Teams using AWS Glue for ETL often need Athena to query tables managed by Glue’s centralized metadata store.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const example = new aws.athena.DataCatalog("example", {
name: "glue-data-catalog",
description: "Glue based Data Catalog",
type: "GLUE",
parameters: {
"catalog-id": "123456789012",
},
});
import pulumi
import pulumi_aws as aws
example = aws.athena.DataCatalog("example",
name="glue-data-catalog",
description="Glue based Data Catalog",
type="GLUE",
parameters={
"catalog-id": "123456789012",
})
package main
import (
"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/athena"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := athena.NewDataCatalog(ctx, "example", &athena.DataCatalogArgs{
Name: pulumi.String("glue-data-catalog"),
Description: pulumi.String("Glue based Data Catalog"),
Type: pulumi.String("GLUE"),
Parameters: pulumi.StringMap{
"catalog-id": pulumi.String("123456789012"),
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;
return await Deployment.RunAsync(() =>
{
var example = new Aws.Athena.DataCatalog("example", new()
{
Name = "glue-data-catalog",
Description = "Glue based Data Catalog",
Type = "GLUE",
Parameters =
{
{ "catalog-id", "123456789012" },
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.athena.DataCatalog;
import com.pulumi.aws.athena.DataCatalogArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var example = new DataCatalog("example", DataCatalogArgs.builder()
.name("glue-data-catalog")
.description("Glue based Data Catalog")
.type("GLUE")
.parameters(Map.of("catalog-id", "123456789012"))
.build());
}
}
resources:
example:
type: aws:athena:DataCatalog
properties:
name: glue-data-catalog
description: Glue based Data Catalog
type: GLUE
parameters:
catalog-id: '123456789012'
The type property set to “GLUE” tells Athena to use AWS Glue’s catalog service. The parameters object requires a catalog-id, which is your AWS account ID. Once registered, Athena queries can reference databases and tables from your Glue catalog.
Connect to external Hive metastore
Organizations migrating from on-premises Hadoop or running hybrid platforms often maintain Hive metastores outside AWS.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const example = new aws.athena.DataCatalog("example", {
name: "hive-data-catalog",
description: "Hive based Data Catalog",
type: "HIVE",
parameters: {
"metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function",
},
});
import pulumi
import pulumi_aws as aws
example = aws.athena.DataCatalog("example",
name="hive-data-catalog",
description="Hive based Data Catalog",
type="HIVE",
parameters={
"metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function",
})
package main
import (
"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/athena"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := athena.NewDataCatalog(ctx, "example", &athena.DataCatalogArgs{
Name: pulumi.String("hive-data-catalog"),
Description: pulumi.String("Hive based Data Catalog"),
Type: pulumi.String("HIVE"),
Parameters: pulumi.StringMap{
"metadata-function": pulumi.String("arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function"),
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;
return await Deployment.RunAsync(() =>
{
var example = new Aws.Athena.DataCatalog("example", new()
{
Name = "hive-data-catalog",
Description = "Hive based Data Catalog",
Type = "HIVE",
Parameters =
{
{ "metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function" },
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.athena.DataCatalog;
import com.pulumi.aws.athena.DataCatalogArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var example = new DataCatalog("example", DataCatalogArgs.builder()
.name("hive-data-catalog")
.description("Hive based Data Catalog")
.type("HIVE")
.parameters(Map.of("metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function"))
.build());
}
}
resources:
example:
type: aws:athena:DataCatalog
properties:
name: hive-data-catalog
description: Hive based Data Catalog
type: HIVE
parameters:
metadata-function: arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function
The type property set to “HIVE” enables external metastore connectivity. The metadata-function parameter points to a Lambda function that translates Athena’s metadata requests into calls your Hive metastore understands. Your Lambda must implement Athena’s metadata protocol to handle schema lookups and table discovery.
Build custom federated catalog with Lambda
Some data sources require custom metadata translation or live in systems without native Athena connectors.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const example = new aws.athena.DataCatalog("example", {
name: "lambda-data-catalog",
description: "Lambda based Data Catalog",
type: "LAMBDA",
parameters: {
"metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1",
"record-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2",
},
});
import pulumi
import pulumi_aws as aws
example = aws.athena.DataCatalog("example",
name="lambda-data-catalog",
description="Lambda based Data Catalog",
type="LAMBDA",
parameters={
"metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1",
"record-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2",
})
package main
import (
"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/athena"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := athena.NewDataCatalog(ctx, "example", &athena.DataCatalogArgs{
Name: pulumi.String("lambda-data-catalog"),
Description: pulumi.String("Lambda based Data Catalog"),
Type: pulumi.String("LAMBDA"),
Parameters: pulumi.StringMap{
"metadata-function": pulumi.String("arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1"),
"record-function": pulumi.String("arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2"),
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;
return await Deployment.RunAsync(() =>
{
var example = new Aws.Athena.DataCatalog("example", new()
{
Name = "lambda-data-catalog",
Description = "Lambda based Data Catalog",
Type = "LAMBDA",
Parameters =
{
{ "metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1" },
{ "record-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2" },
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.athena.DataCatalog;
import com.pulumi.aws.athena.DataCatalogArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var example = new DataCatalog("example", DataCatalogArgs.builder()
.name("lambda-data-catalog")
.description("Lambda based Data Catalog")
.type("LAMBDA")
.parameters(Map.ofEntries(
Map.entry("metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1"),
Map.entry("record-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2")
))
.build());
}
}
resources:
example:
type: aws:athena:DataCatalog
properties:
name: lambda-data-catalog
description: Lambda based Data Catalog
type: LAMBDA
parameters:
metadata-function: arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1
record-function: arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2
Lambda-based catalogs extend the Hive example by adding a record-function for data retrieval. The metadata-function handles schema and table discovery, while the record-function fetches actual data rows. Both functions must implement Athena’s federated query protocol. This approach lets you query arbitrary data sources like REST APIs, NoSQL databases, or custom file formats.
Beyond these examples
These snippets focus on specific data catalog features: Glue catalog integration, external Hive metastore connectivity, and custom Lambda-based federation. They’re intentionally minimal rather than full query federation solutions.
The examples reference pre-existing infrastructure such as Lambda functions implementing Athena’s metadata/record protocols, and AWS Glue catalogs or external Hive metastores. They focus on catalog registration rather than provisioning the underlying metadata sources.
To keep things focused, common catalog patterns are omitted, including:
- Resource tagging (tags property)
- Cross-region catalog access
- IAM permissions for Lambda invocation
- Error handling and retry configuration
These omissions are intentional: the goal is to illustrate how each catalog type is wired, not provide drop-in federation modules. See the Athena DataCatalog resource reference for all available configuration options.
Let's create AWS Athena Data Catalogs
Get started with Pulumi Cloud, then follow our quick setup guide to deploy this infrastructure.
Try Pulumi Cloud for FREEFrequently Asked Questions
Catalog Types & Configuration
LAMBDA for federated catalogs, GLUE for AWS Glue Catalog integration, or HIVE for external Hive metastores.Required parameters vary by type:
- LAMBDA: Use
functionfor basic setup, ormetadata-functionandrecord-functionfor advanced configurations - GLUE: Use
catalog-idwith your AWS account ID - HIVE: Use
metadata-functionwith a Lambda function ARN
Naming & Constraints
name property is immutable and requires resource replacement if changed.Using a different cloud?
Explore analytics guides for other cloud providers: