Create AWS Glue Data Quality Rulesets

The aws:glue/dataQualityRuleset:DataQualityRuleset resource, part of the Pulumi AWS provider, defines data quality rules using DQDL syntax that Glue evaluates against catalog tables. This guide focuses on two capabilities: DQDL rule syntax and catalog table binding.

Rulesets can reference Glue Data Catalog tables for automatic evaluation during ETL jobs or crawls. The examples are intentionally small. Combine them with your own Glue jobs, crawlers, and catalog tables.

Define data quality rules with DQDL syntax

Data quality monitoring starts by writing rules that check column completeness, uniqueness, or statistical properties.

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const example = new aws.glue.DataQualityRuleset("example", {
    name: "example",
    ruleset: "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
});
import pulumi
import pulumi_aws as aws

example = aws.glue.DataQualityRuleset("example",
    name="example",
    ruleset="Rules = [Completeness \"colA\" between 0.4 and 0.8]")
package main

import (
	"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/glue"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := glue.NewDataQualityRuleset(ctx, "example", &glue.DataQualityRulesetArgs{
			Name:    pulumi.String("example"),
			Ruleset: pulumi.String("Rules = [Completeness \"colA\" between 0.4 and 0.8]"),
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;

return await Deployment.RunAsync(() => 
{
    var example = new Aws.Glue.DataQualityRuleset("example", new()
    {
        Name = "example",
        Ruleset = "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.glue.DataQualityRuleset;
import com.pulumi.aws.glue.DataQualityRulesetArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var example = new DataQualityRuleset("example", DataQualityRulesetArgs.builder()
            .name("example")
            .ruleset("Rules = [Completeness \"colA\" between 0.4 and 0.8]")
            .build());

    }
}
resources:
  example:
    type: aws:glue:DataQualityRuleset
    properties:
      name: example
      ruleset: Rules = [Completeness "colA" between 0.4 and 0.8]

The ruleset property contains DQDL (Data Quality Definition Language) expressions that define your quality checks. In this case, the rule verifies that column “colA” has completeness between 40% and 80%. Glue evaluates these rules during ETL jobs or on-demand scans, flagging violations for review.

Associate rules with a specific catalog table

When rules apply to a specific table, you can bind the ruleset directly to that table for automatic evaluation.

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const example = new aws.glue.DataQualityRuleset("example", {
    name: "example",
    ruleset: "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
    targetTable: {
        databaseName: exampleAwsGlueCatalogDatabase.name,
        tableName: exampleAwsGlueCatalogTable.name,
    },
});
import pulumi
import pulumi_aws as aws

example = aws.glue.DataQualityRuleset("example",
    name="example",
    ruleset="Rules = [Completeness \"colA\" between 0.4 and 0.8]",
    target_table={
        "database_name": example_aws_glue_catalog_database["name"],
        "table_name": example_aws_glue_catalog_table["name"],
    })
package main

import (
	"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/glue"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := glue.NewDataQualityRuleset(ctx, "example", &glue.DataQualityRulesetArgs{
			Name:    pulumi.String("example"),
			Ruleset: pulumi.String("Rules = [Completeness \"colA\" between 0.4 and 0.8]"),
			TargetTable: &glue.DataQualityRulesetTargetTableArgs{
				DatabaseName: pulumi.Any(exampleAwsGlueCatalogDatabase.Name),
				TableName:    pulumi.Any(exampleAwsGlueCatalogTable.Name),
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;

return await Deployment.RunAsync(() => 
{
    var example = new Aws.Glue.DataQualityRuleset("example", new()
    {
        Name = "example",
        Ruleset = "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
        TargetTable = new Aws.Glue.Inputs.DataQualityRulesetTargetTableArgs
        {
            DatabaseName = exampleAwsGlueCatalogDatabase.Name,
            TableName = exampleAwsGlueCatalogTable.Name,
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.glue.DataQualityRuleset;
import com.pulumi.aws.glue.DataQualityRulesetArgs;
import com.pulumi.aws.glue.inputs.DataQualityRulesetTargetTableArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var example = new DataQualityRuleset("example", DataQualityRulesetArgs.builder()
            .name("example")
            .ruleset("Rules = [Completeness \"colA\" between 0.4 and 0.8]")
            .targetTable(DataQualityRulesetTargetTableArgs.builder()
                .databaseName(exampleAwsGlueCatalogDatabase.name())
                .tableName(exampleAwsGlueCatalogTable.name())
                .build())
            .build());

    }
}
resources:
  example:
    type: aws:glue:DataQualityRuleset
    properties:
      name: example
      ruleset: Rules = [Completeness "colA" between 0.4 and 0.8]
      targetTable:
        databaseName: ${exampleAwsGlueCatalogDatabase.name}
        tableName: ${exampleAwsGlueCatalogTable.name}

The targetTable property links the ruleset to a specific database and table in your Glue Data Catalog. When Glue crawlers or jobs process this table, they automatically evaluate the ruleset. The databaseName and tableName must reference existing catalog resources.

Beyond these examples

These snippets focus on specific ruleset features: DQDL rule definition and catalog table association. They’re intentionally minimal rather than full data quality pipelines.

The examples may reference pre-existing infrastructure such as Glue Data Catalog databases and tables. They focus on defining the ruleset rather than provisioning the catalog or jobs that evaluate it.

To keep things focused, common ruleset patterns are omitted, including:

  • Description and tagging (metadata organization)
  • Recommendation run integration (recommendationRunId)
  • Multi-rule rulesets with complex DQDL expressions
  • Integration with Glue jobs or crawlers for automatic evaluation

These omissions are intentional: the goal is to illustrate how ruleset features are wired, not provide drop-in data quality modules. See the Glue Data Quality Ruleset resource reference for all available configuration options.

Let's create AWS Glue Data Quality Rulesets

Get started with Pulumi Cloud, then follow our quick setup guide to deploy this infrastructure.

Try Pulumi Cloud for FREE

Frequently Asked Questions

Configuration & Immutability
What properties can't I change after creating a data quality ruleset?
The name and targetTable properties are immutable. Changing either will force replacement of the ruleset.
What's the minimum configuration needed to create a data quality ruleset?
You need to provide name and ruleset properties. The ruleset must be a valid DQDL string, such as Rules = [Completeness "colA" between 0.4 and 0.8].
Ruleset Configuration
What is DQDL and how do I write ruleset syntax?
DQDL (Data Quality Definition Language) is AWS Glue’s ruleset syntax. The examples show rules like Rules = [Completeness "colA" between 0.4 and 0.8]. For complete syntax details, refer to the AWS Glue developer guide.
How do I associate a ruleset with a specific Glue Catalog table?
Configure the targetTable property with databaseName and tableName fields pointing to your Glue Catalog table. Note that targetTable is immutable after creation.
Outputs & Metadata
What is the recommendationRunId output used for?
When a ruleset is created from a recommendation run, recommendationRunId links the ruleset back to that recommendation run.
What's the difference between tags and tagsAll?
tags contains the tags you explicitly set on the resource. tagsAll includes both your tags and any tags inherited from the provider’s defaultTags configuration.

Using a different cloud?

Explore analytics guides for other cloud providers: