The aws:glue/dataQualityRuleset:DataQualityRuleset resource, part of the Pulumi AWS provider, defines data quality rules using DQDL (Data Quality Definition Language) syntax that Glue evaluates against catalog tables. This guide focuses on two capabilities: DQDL rule syntax and catalog table binding.
Rulesets can reference Glue catalog tables for automatic evaluation during ETL jobs. The examples are intentionally small. Combine them with your own Glue jobs and data quality evaluation workflows.
Define data quality rules with DQDL syntax
Data quality monitoring starts by defining rules that check column completeness, uniqueness, or statistical properties.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const example = new aws.glue.DataQualityRuleset("example", {
name: "example",
ruleset: "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
});
import pulumi
import pulumi_aws as aws
example = aws.glue.DataQualityRuleset("example",
name="example",
ruleset="Rules = [Completeness \"colA\" between 0.4 and 0.8]")
package main
import (
"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/glue"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := glue.NewDataQualityRuleset(ctx, "example", &glue.DataQualityRulesetArgs{
Name: pulumi.String("example"),
Ruleset: pulumi.String("Rules = [Completeness \"colA\" between 0.4 and 0.8]"),
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;
return await Deployment.RunAsync(() =>
{
var example = new Aws.Glue.DataQualityRuleset("example", new()
{
Name = "example",
Ruleset = "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.glue.DataQualityRuleset;
import com.pulumi.aws.glue.DataQualityRulesetArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var example = new DataQualityRuleset("example", DataQualityRulesetArgs.builder()
.name("example")
.ruleset("Rules = [Completeness \"colA\" between 0.4 and 0.8]")
.build());
}
}
resources:
example:
type: aws:glue:DataQualityRuleset
properties:
name: example
ruleset: Rules = [Completeness "colA" between 0.4 and 0.8]
The ruleset property contains DQDL expressions that define quality checks. In this example, the rule verifies that column “colA” has completeness between 40% and 80%. Glue evaluates these rules during ETL jobs or on-demand scans, flagging violations for review.
Associate rules with a specific catalog table
When rules apply to a single table, binding the ruleset to that table makes the relationship explicit and enables automatic evaluation.
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
const example = new aws.glue.DataQualityRuleset("example", {
name: "example",
ruleset: "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
targetTable: {
databaseName: exampleAwsGlueCatalogDatabase.name,
tableName: exampleAwsGlueCatalogTable.name,
},
});
import pulumi
import pulumi_aws as aws
example = aws.glue.DataQualityRuleset("example",
name="example",
ruleset="Rules = [Completeness \"colA\" between 0.4 and 0.8]",
target_table={
"database_name": example_aws_glue_catalog_database["name"],
"table_name": example_aws_glue_catalog_table["name"],
})
package main
import (
"github.com/pulumi/pulumi-aws/sdk/v7/go/aws/glue"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
_, err := glue.NewDataQualityRuleset(ctx, "example", &glue.DataQualityRulesetArgs{
Name: pulumi.String("example"),
Ruleset: pulumi.String("Rules = [Completeness \"colA\" between 0.4 and 0.8]"),
TargetTable: &glue.DataQualityRulesetTargetTableArgs{
DatabaseName: pulumi.Any(exampleAwsGlueCatalogDatabase.Name),
TableName: pulumi.Any(exampleAwsGlueCatalogTable.Name),
},
})
if err != nil {
return err
}
return nil
})
}
using System.Collections.Generic;
using System.Linq;
using Pulumi;
using Aws = Pulumi.Aws;
return await Deployment.RunAsync(() =>
{
var example = new Aws.Glue.DataQualityRuleset("example", new()
{
Name = "example",
Ruleset = "Rules = [Completeness \"colA\" between 0.4 and 0.8]",
TargetTable = new Aws.Glue.Inputs.DataQualityRulesetTargetTableArgs
{
DatabaseName = exampleAwsGlueCatalogDatabase.Name,
TableName = exampleAwsGlueCatalogTable.Name,
},
});
});
package generated_program;
import com.pulumi.Context;
import com.pulumi.Pulumi;
import com.pulumi.core.Output;
import com.pulumi.aws.glue.DataQualityRuleset;
import com.pulumi.aws.glue.DataQualityRulesetArgs;
import com.pulumi.aws.glue.inputs.DataQualityRulesetTargetTableArgs;
import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Paths;
public class App {
public static void main(String[] args) {
Pulumi.run(App::stack);
}
public static void stack(Context ctx) {
var example = new DataQualityRuleset("example", DataQualityRulesetArgs.builder()
.name("example")
.ruleset("Rules = [Completeness \"colA\" between 0.4 and 0.8]")
.targetTable(DataQualityRulesetTargetTableArgs.builder()
.databaseName(exampleAwsGlueCatalogDatabase.name())
.tableName(exampleAwsGlueCatalogTable.name())
.build())
.build());
}
}
resources:
example:
type: aws:glue:DataQualityRuleset
properties:
name: example
ruleset: Rules = [Completeness "colA" between 0.4 and 0.8]
targetTable:
databaseName: ${exampleAwsGlueCatalogDatabase.name}
tableName: ${exampleAwsGlueCatalogTable.name}
The targetTable property binds the ruleset to a specific Glue catalog table by database and table name. This association allows Glue to automatically evaluate the rules during job runs that process the target table, without requiring manual configuration in each job.
Beyond these examples
These snippets focus on specific ruleset features: DQDL rule definition and catalog table binding. They’re intentionally minimal rather than full data quality monitoring solutions.
The examples may reference pre-existing infrastructure such as Glue catalog databases and tables for targetTable binding. They focus on ruleset configuration rather than provisioning the catalog or orchestrating evaluation runs.
To keep things focused, common ruleset patterns are omitted, including:
- Description and tagging (description, tags properties)
- Recommendation run integration (recommendationRunId)
- Multi-rule rulesets with complex DQDL expressions
- Integration with Glue jobs and data quality evaluation runs
These omissions are intentional: the goal is to illustrate how ruleset features are wired, not provide drop-in data quality modules. See the Glue Data Quality Ruleset resource reference for all available configuration options.
Let's create AWS Glue Data Quality Rulesets
Get started with Pulumi Cloud, then follow our quick setup guide to deploy this infrastructure.
Try Pulumi Cloud for FREEFrequently Asked Questions
Configuration & Setup
name and a ruleset string written in DQDL (Data Quality Definition Language) format. For example: Rules = [Completeness "colA" between 0.4 and 0.8].region property determines where the ruleset is managed, defaulting to the region configured in your Pulumi provider.Immutability & Updates
name property is immutable; changing it forces replacement of the entire resource.targetTable is immutable. Changing the database or table name forces resource replacement.Table Association & Features
targetTable with databaseName and tableName pointing to your Glue Catalog resources.recommendationRunId links the ruleset to that run.pulumi import aws:glue/dataQualityRuleset:DataQualityRuleset <resource-name> <ruleset-name>. For example: pulumi import aws:glue/dataQualityRuleset:DataQualityRuleset example exampleName.Using a different cloud?
Explore analytics guides for other cloud providers: