New Relic AlertCondition for AI Workload Thresholds

Question

Pulumi · Accepted Answer

In order to create a New Relic Alert Condition for AI Workload Thresholds using Pulumi, you'll need to define conditions that will trigger alerts when certain metric thresholds related to your AI workloads are reached or exceeded.

For this task, you would typically use a New Relic `AlertPolicy` to group your conditions and `NrqlAlertCondition` to define the specific NRQL (New Relic Query Language) query and thresholds that, when exceeded, will trigger an alert. This allows you to monitor specific metrics or complex queries that might be relevant to your AI workloads.

Below is a Pulumi program that demonstrates how to set up an alert policy and a NRQL alert condition. The alert condition in this example is triggered when the average duration of a transaction exceeds a certain threshold over a given period.

```python
import pulumi
import pulumi_newrelic as newrelic

# Create a New Relic Alert Policy.
# Alert policies are used to group alert conditions including NRQL alert conditions.
policy = newrelic.AlertPolicy("ai-workload-policy",
    name="AI Workload Policy")

# Define a NRQL Alert Condition for AI Workload Thresholds.
# This example triggers an alert for the average duration (response time)
# of a transaction being above a threshold within a time window, which could
# indicate performance issues in an AI service.
nrql_condition = newrelic.NrqlAlertCondition("high-response-time",
    policy_id=policy.id,
    # Define the NRQL query to specify the data that will be evaluated to determine if an alert needs to be triggered.
    nrql={
        "query": "SELECT average(duration) FROM Transaction WHERE appName = 'AI Service'", # Example query; this will need customization based on actual service and what's being monitored.
        "evaluation_offset": 3, # The number of minutes offset to use when evaluating the NRQL query.
    },
    # Specify the type of condition, which in this case is static, meaning it compares to a static threshold value.
    type="static",
    # Set the condition to be true when the query returns a value that is above the defined threshold.
    terms=[
        {
            "operator": "above",
            "priority": "critical",
            "threshold": 0.5, # Threshold value to trigger a critical alert (this is a placeholder and should be replaced with relevant threshold based on your needs).
            "threshold_duration": 60000, # milliseconds; the time window during which this condition must be true for the alert to be triggered.
            "threshold_occurrences": "ALL", # The condition must be true for all instances within the threshold duration.
        },
    ],
    # Optionally, you can specify the violation time limit which closes all violations after a certain time period of continued violation.
    violation_time_limit_seconds=3600, # Number of seconds to automatically close violations after being opened.
)

# Export the ID of the policy and the condition.
pulumi.export('policy_id', policy.id)
pulumi.export('nrql_condition_id', nrql_condition.id)
```

In this Pulumi program:

- `newrelic.AlertPolicy` creates a new policy under which you can add multiple alert conditions. This policy acts as a container for your AI workload-related alerts.
  
- `newrelic.NrqlAlertCondition` defines a condition based on a NRQL query. This specific condition uses the `SELECT average(duration) FROM Transaction` NRQL query to monitor the average duration of a transaction and triggers a critical alert if the result is above `0.5` for the entire duration of ten minutes (60000 milliseconds). The `evaluation_offset` is set to 3 minutes to give some allowance for data ingestion and processing.

Remember, when setting up alert conditions, the `query` field should accurately target the metrics you want to monitor, which could differ based on your specific AI application's performance indicators and the data available in New Relic.

Lastly, the `pulumi.export` statements are used to output the IDs of the created resources in your Pulumi stack, which could be useful for reference or integration with other systems and tools.

This is a basic setup, and you should adjust the NRQL query, thresholds, and other parameters according to the specifics of your AI workloads and the metrics you need to monitor.