Time Series Anomaly Detection with Azure Kusto

Question

Pulumi · Accepted Answer

To set up time series anomaly detection with Azure Kusto, you will need to deploy resources that allow for data ingestion, processing, and analysis with Azure Data Explorer (Kusto). This includes provisioning an Azure Kusto Cluster, databases, and data connections for ingesting data, and you might also require a front-end service for visualization and monitoring.

Below is a Pulumi program written in Python that demonstrates how to create a Kusto cluster, a database in that cluster, and configure the cluster with principal assignments for access control. Kusto is a big data analytics service which allows you to perform time series analysis using its advanced query language, KQL (Kusto Query Language).

I will walk you through the following steps in the provided Pulumi program:

Setup the Azure Kusto Cluster: A cluster is a pair of engine and data management clusters. It's where data is ingested and queries are processed.
Create a Kusto Database within the cluster: Databases hold the data you ingest and define the namespace for KQL queries.
Assign a principal to the database: Principal assignments manage access permissions to the database.

Pulumi program for Azure Kusto Cluster deployment

import pulumi
from pulumi_azure_native import resources, synapse, kusto

# Create an Azure Resource Group
resource_group = resources.ResourceGroup('rg')

# Provision an Azure Kusto Cluster
kusto_cluster = kusto.Cluster('kustoCluster',
    resource_group_name=resource_group.name,
    location=resource_group.location,
    sku=kusto.SkuArgs(
        name='Standard_D13_v2',
        tier='Standard',
    ),
    # Additional parameters can be provided based on requirements,
    # such as the `identity` for using Managed Service Identity.
)

# Create an Azure Kusto Database in the provisioned cluster
kusto_database = kusto.Database('kustoDatabase',
    resource_group_name=resource_group.name,
    cluster_name=kusto_cluster.name,
    # Additional parameters can be provided, e.g., `soft_delete_period` or `hot_cache_period`.
)

# Database Principal Assignment (This is for example purposes; you will need to specify valid principal IDs)
# Principal assignments manage access permissions to the database. This is critical for controlling who can perform what operations on your Kusto Database.
principal_assignment = synapse.KustoPoolDatabasePrincipalAssignment(
    'kustoPrincipalAssignment',
    resource_group_name=resource_group.name,
    workspace_name='<YourWorkspaceName>', # Replace with your actual workspace name
    kusto_pool_name=kusto_cluster.name,
    database_name=kusto_database.name,
    principal_id='<YourPrincipalId>', # Replace with your actual principal ID (e.g., user or service principal)
    principal_type='App', # Change as needed based on your principal type
    role='AllDatabasesAdmin', # Role can be changed based on required permissions
    # You would typically obtain TenantId and PrincipalId from your Azure AD setup.
    tenant_id='<YourTenantId>', # Replace with your actual tenant ID
    principal_assignment_name='principalAssignmentName' # This must be globally unique
)

# Export the Kusto Cluster URI to be used by clients
pulumi.export('kustoClusterUri', kusto_cluster.uri)

# Export the Kusto Database name
pulumi.export('kustoDatabaseName', kusto_database.name)

In the program above, you first define an Azure Resource Group which acts as a container to hold related resources for an Azure application. In it, you create a Kusto Cluster by specifying a SKU name and tier that determines the capacity and performance level of the cluster.

Then, you create a Kusto Database inside the cluster. This is where your time-series data will reside and be queried against.

Lastly, you handle access control by creating a principal assignment for the database with a specified role and principal type. This ensures that only authorized entities can access your data and execute queries.

To use this Pulumi program:

Replace placeholders like <YourWorkspaceName>, <YourPrincipalId>, and <YourTenantId> with actual values from your Azure setup.
Install Pulumi, set up your Azure credentials, and run this program using the Pulumi CLI.

This program neatly abstracts away the complexities involved with setting up anomaly detection infrastructure on Azure and enables you to focus on writing the queries and logic for your anomaly detection use case.