The aws:athena/dataCatalog:DataCatalog resource, part of the Pulumi AWS provider, registers external data catalogs with Athena, enabling queries against Glue, Hive, or custom Lambda-based metadata sources. This guide focuses on three capabilities: Glue Data Catalog integration, external Hive metastore federation, and custom Lambda-based connectors.
Data catalogs reference existing Lambda functions, Glue catalogs, or Hive metastores rather than creating them. The examples are intentionally small. Combine them with your own Lambda functions, IAM roles, and connector implementations.
Connect to AWS Glue Data Catalog
Teams using AWS Glue for ETL often need Athena to query tables managed by Glue’s centralized metadata store.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const example = new aws.athena.DataCatalog("example", {
name: "glue-data-catalog",
description: "Glue based Data Catalog",
type: "GLUE",
parameters: {
"catalog-id": "123456789012",
},
});
import pulumi
import pulumi_aws as aws
example = aws.athena.DataCatalog("example",
name="glue-data-catalog",
description="Glue based Data Catalog",
type="GLUE",
parameters={
"catalog-id": "123456789012",
})
package main
import (
"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/athena"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := athena.NewDataCatalog(ctx, "example", &athena.DataCatalogArgs{
Name: pulumi.String("glue-data-catalog"),
Description: pulumi.String("Glue based Data Catalog"),
Type: pulumi.String("GLUE"),
Parameters: pulumi.StringMap{
"catalog-id": pulumi.String("123456789012"),
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;
return await Deployment.RunAsync(() =>
{
var example = new Aws.Athena.DataCatalog("example", new()
{
Name = "glue-data-catalog",
Description = "Glue based Data Catalog",
Type = "GLUE",
Parameters =
{
{ "catalog-id", "123456789012" },
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.athena.DataCatalog;
import com.pulumi.aws.athena.DataCatalogArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var example = new DataCatalog("example", DataCatalogArgs.builder()
.name("glue-data-catalog")
.description("Glue based Data Catalog")
.type("GLUE")
.parameters(Map.of("catalog-id", "123456789012"))
.build());
}
}
resources:
example:
type: aws:athena:DataCatalog
properties:
name: glue-data-catalog
description: Glue based Data Catalog
type: GLUE
parameters:
catalog-id: '123456789012'
The type property set to “GLUE” tells Athena to use AWS Glue’s metadata service. The parameters object requires a catalog-id key containing your AWS account ID. Once registered, Athena queries can reference tables and databases from the Glue catalog.
Connect to external Hive metastore
Organizations running on-premises Hadoop or self-managed Hive can federate their existing metastore into Athena without migrating metadata.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const example = new aws.athena.DataCatalog("example", {
name: "hive-data-catalog",
description: "Hive based Data Catalog",
type: "HIVE",
parameters: {
"metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function",
},
});
import pulumi
import pulumi_aws as aws
example = aws.athena.DataCatalog("example",
name="hive-data-catalog",
description="Hive based Data Catalog",
type="HIVE",
parameters={
"metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function",
})
package main
import (
"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/athena"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := athena.NewDataCatalog(ctx, "example", &athena.DataCatalogArgs{
Name: pulumi.String("hive-data-catalog"),
Description: pulumi.String("Hive based Data Catalog"),
Type: pulumi.String("HIVE"),
Parameters: pulumi.StringMap{
"metadata-function": pulumi.String("arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function"),
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;
return await Deployment.RunAsync(() =>
{
var example = new Aws.Athena.DataCatalog("example", new()
{
Name = "hive-data-catalog",
Description = "Hive based Data Catalog",
Type = "HIVE",
Parameters =
{
{ "metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function" },
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.athena.DataCatalog;
import com.pulumi.aws.athena.DataCatalogArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var example = new DataCatalog("example", DataCatalogArgs.builder()
.name("hive-data-catalog")
.description("Hive based Data Catalog")
.type("HIVE")
.parameters(Map.of("metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function"))
.build());
}
}
resources:
example:
type: aws:athena:DataCatalog
properties:
name: hive-data-catalog
description: Hive based Data Catalog
type: HIVE
parameters:
metadata-function: arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function
The type property set to “HIVE” enables Hive metastore federation. The parameters object requires a metadata-function key pointing to a Lambda function that implements the Hive connector protocol. This Lambda function translates Athena’s metadata requests into calls against your Hive metastore.
Build custom federated catalog with Lambda
Some data sources require custom query federation logic that isn’t covered by built-in connectors.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const example = new aws.athena.DataCatalog("example", {
name: "lambda-data-catalog",
description: "Lambda based Data Catalog",
type: "LAMBDA",
parameters: {
"metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1",
"record-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2",
},
});
import pulumi
import pulumi_aws as aws
example = aws.athena.DataCatalog("example",
name="lambda-data-catalog",
description="Lambda based Data Catalog",
type="LAMBDA",
parameters={
"metadata-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1",
"record-function": "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2",
})
package main
import (
"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/athena"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := athena.NewDataCatalog(ctx, "example", &athena.DataCatalogArgs{
Name: pulumi.String("lambda-data-catalog"),
Description: pulumi.String("Lambda based Data Catalog"),
Type: pulumi.String("LAMBDA"),
Parameters: pulumi.StringMap{
"metadata-function": pulumi.String("arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1"),
"record-function": pulumi.String("arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2"),
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;
return await Deployment.RunAsync(() =>
{
var example = new Aws.Athena.DataCatalog("example", new()
{
Name = "lambda-data-catalog",
Description = "Lambda based Data Catalog",
Type = "LAMBDA",
Parameters =
{
{ "metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1" },
{ "record-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2" },
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.athena.DataCatalog;
import com.pulumi.aws.athena.DataCatalogArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var example = new DataCatalog("example", DataCatalogArgs.builder()
.name("lambda-data-catalog")
.description("Lambda based Data Catalog")
.type("LAMBDA")
.parameters(Map.ofEntries(
Map.entry("metadata-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1"),
Map.entry("record-function", "arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2")
))
.build());
}
}
resources:
example:
type: aws:athena:DataCatalog
properties:
name: lambda-data-catalog
description: Lambda based Data Catalog
type: LAMBDA
parameters:
metadata-function: arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-1
record-function: arn:aws:lambda:eu-central-1:123456789012:function:not-important-lambda-function-2
The type property set to “LAMBDA” enables fully custom federation. The parameters object requires two Lambda functions: metadata-function handles schema discovery and table listings, while record-function retrieves actual data rows. Both functions must implement Athena’s federation protocol to translate queries into source-specific operations.
Beyond these examples
These snippets focus on specific data catalog features: Glue, Hive, and Lambda catalog types, and catalog-specific parameter mapping. They’re intentionally minimal rather than full federation deployments.
The examples reference pre-existing infrastructure such as Lambda functions implementing connector logic, and AWS Glue Data Catalog or external Hive metastore. They focus on catalog registration rather than building the underlying connectors.
To keep things focused, common catalog patterns are omitted, including:
- Resource tagging (tags property)
- Cross-region catalog access
- IAM permissions for Lambda invocation
- Connector deployment and testing
These omissions are intentional: the goal is to illustrate how each catalog type is wired, not provide drop-in federation modules. See the Athena DataCatalog resource reference for all available configuration options.
Let's create AWS Athena Data Catalogs
Get started with Pulumi Cloud, then follow our quick setup guide to deploy this infrastructure.
Try Pulumi Cloud for FREEFrequently Asked Questions
Catalog Types & Configuration
LAMBDA for federated catalogs, GLUE for AWS Glue Catalog integration, or HIVE for external Hive metastores.Parameters vary by type:
- LAMBDA: Either
function(single Lambda ARN), or bothmetadata-functionandrecord-function(two Lambda ARNs) - GLUE:
catalog-id(your AWS account ID) - HIVE:
metadata-function(Lambda ARN for metadata operations)
Naming & Constraints
name property is immutable. Renaming requires deleting and recreating the resource.name property is immutable. You can modify description, parameters, and tags after creation.Using a different cloud?
Explore analytics guides for other cloud providers: