Create AWS Glue Data Quality Rulesets

The aws:glue/dataQualityRuleset:DataQualityRuleset resource, part of the Pulumi AWS provider, defines data quality rules using DQDL (Data Quality Definition Language) syntax that Glue evaluates against catalog tables. This guide focuses on two capabilities: DQDL rule syntax and catalog table binding.

Rulesets can reference Glue catalog tables for automatic evaluation during ETL jobs. The examples are intentionally small. Combine them with your own Glue jobs and data quality evaluation workflows.

Define data quality rules with DQDL syntax

Data quality monitoring starts by defining rules that check column completeness, uniqueness, or statistical properties.

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const example = new aws.glue.DataQualityRuleset("example", {
    name: "example",
    ruleset: "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
});
import pulumi
import pulumi_aws as aws

example = aws.glue.DataQualityRuleset("example",
    name="example",
    ruleset="Rules = [Completeness \"colA\" between 0.4 and 0.8]")
package main

import (
	"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/glue"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := glue.NewDataQualityRuleset(ctx, "example", &glue.DataQualityRulesetArgs{
			Name:    pulumi.String("example"),
			Ruleset: pulumi.String("Rules = [Completeness \"colA\" between 0.4 and 0.8]"),
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;

return await Deployment.RunAsync(() => 
{
    var example = new Aws.Glue.DataQualityRuleset("example", new()
    {
        Name = "example",
        Ruleset = "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.glue.DataQualityRuleset;
import com.pulumi.aws.glue.DataQualityRulesetArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var example = new DataQualityRuleset("example", DataQualityRulesetArgs.builder()
            .name("example")
            .ruleset("Rules = [Completeness \"colA\" between 0.4 and 0.8]")
            .build());

    }
}
resources:
  example:
    type: aws:glue:DataQualityRuleset
    properties:
      name: example
      ruleset: Rules = [Completeness "colA" between 0.4 and 0.8]

The ruleset property contains DQDL expressions that define quality checks. In this example, the rule verifies that column “colA” has completeness between 40% and 80%. Glue evaluates these rules during ETL jobs or on-demand scans, flagging violations for review.

Associate rules with a specific catalog table

When rules apply to a single table, binding the ruleset to that table makes the relationship explicit and enables automatic evaluation.

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

const example = new aws.glue.DataQualityRuleset("example", {
    name: "example",
    ruleset: "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
    targetTable: {
        databaseName: exampleAwsGlueCatalogDatabase.name,
        tableName: exampleAwsGlueCatalogTable.name,
    },
});
import pulumi
import pulumi_aws as aws

example = aws.glue.DataQualityRuleset("example",
    name="example",
    ruleset="Rules = [Completeness \"colA\" between 0.4 and 0.8]",
    target_table={
        "database_name": example_aws_glue_catalog_database["name"],
        "table_name": example_aws_glue_catalog_table["name"],
    })
package main

import (
	"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/glue"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {
		_, err := glue.NewDataQualityRuleset(ctx, "example", &glue.DataQualityRulesetArgs{
			Name:    pulumi.String("example"),
			Ruleset: pulumi.String("Rules = [Completeness \"colA\" between 0.4 and 0.8]"),
			TargetTable: &glue.DataQualityRulesetTargetTableArgs{
				DatabaseName: pulumi.Any(exampleAwsGlueCatalogDatabase.Name),
				TableName:    pulumi.Any(exampleAwsGlueCatalogTable.Name),
			},
		})
		if err != nil {
			return err
		}
		return nil
	})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;

return await Deployment.RunAsync(() => 
{
    var example = new Aws.Glue.DataQualityRuleset("example", new()
    {
        Name = "example",
        Ruleset = "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
        TargetTable = new Aws.Glue.Inputs.DataQualityRulesetTargetTableArgs
        {
            DatabaseName = exampleAwsGlueCatalogDatabase.Name,
            TableName = exampleAwsGlueCatalogTable.Name,
        },
    });

});
package generated_program;

import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.glue.DataQualityRuleset;
import com.pulumi.aws.glue.DataQualityRulesetArgs;
import com.pulumi.aws.glue.inputs.DataQualityRulesetTargetTableArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;

public class App {
    public static void main(String[] args) {
        Pulumi.run(App::stack);
    }

    public static void stack(Context ctx) {
        var example = new DataQualityRuleset("example", DataQualityRulesetArgs.builder()
            .name("example")
            .ruleset("Rules = [Completeness \"colA\" between 0.4 and 0.8]")
            .targetTable(DataQualityRulesetTargetTableArgs.builder()
                .databaseName(exampleAwsGlueCatalogDatabase.name())
                .tableName(exampleAwsGlueCatalogTable.name())
                .build())
            .build());

    }
}
resources:
  example:
    type: aws:glue:DataQualityRuleset
    properties:
      name: example
      ruleset: Rules = [Completeness "colA" between 0.4 and 0.8]
      targetTable:
        databaseName: ${exampleAwsGlueCatalogDatabase.name}
        tableName: ${exampleAwsGlueCatalogTable.name}

The targetTable property binds the ruleset to a specific Glue catalog table by database and table name. This association allows Glue to automatically evaluate the rules during job runs that process the target table, without requiring manual configuration in each job.

Beyond these examples

These snippets focus on specific ruleset features: DQDL rule definition and catalog table binding. They’re intentionally minimal rather than full data quality monitoring solutions.

The examples may reference pre-existing infrastructure such as Glue catalog databases and tables for targetTable binding. They focus on ruleset configuration rather than provisioning the catalog or orchestrating evaluation runs.

To keep things focused, common ruleset patterns are omitted, including:

  • Description and tagging (description, tags properties)
  • Recommendation run integration (recommendationRunId)
  • Multi-rule rulesets with complex DQDL expressions
  • Integration with Glue jobs and data quality evaluation runs

These omissions are intentional: the goal is to illustrate how ruleset features are wired, not provide drop-in data quality modules. See the Glue Data Quality Ruleset resource reference for all available configuration options.

Let's create AWS Glue Data Quality Rulesets

Get started with Pulumi Cloud, then follow our quick setup guide to deploy this infrastructure.

Try Pulumi Cloud for FREE

Frequently Asked Questions

Configuration & Setup
What's required to create a data quality ruleset?
You need a name and a ruleset string written in DQDL (Data Quality Definition Language) format. For example: Rules = [Completeness "colA" between 0.4 and 0.8].
What is DQDL?
DQDL (Data Quality Definition Language) is the ruleset format used by AWS Glue Data Quality. Refer to the AWS Glue Developer Guide for full syntax details.
Which region will my ruleset be created in?
The region property determines where the ruleset is managed, defaulting to the region configured in your Pulumi provider.
Immutability & Updates
What happens if I change the ruleset name?
The name property is immutable; changing it forces replacement of the entire resource.
Can I change the target table after creation?
No, targetTable is immutable. Changing the database or table name forces resource replacement.
Table Association & Features
How do I associate a ruleset with a Glue Catalog table?
Configure targetTable with databaseName and tableName pointing to your Glue Catalog resources.
What is recommendationRunId used for?
When a ruleset is created from a recommendation run, recommendationRunId links the ruleset to that run.
How do I import an existing data quality ruleset?
Use pulumi import aws:glue/dataQualityRuleset:DataQualityRuleset <resource-name> <ruleset-name>. For example: pulumi import aws:glue/dataQualityRuleset:DataQualityRuleset example exampleName.

Using a different cloud?

Explore analytics guides for other cloud providers: