Partitioning output data to Blob Storage in Stream Analytics
TypeScriptTo partition output data to Blob Storage in Azure Stream Analytics, you would typically perform several steps within your Pulumi program:
- Set up an Azure Blob Storage container where the output data will be stored.
- Set up an Azure Stream Analytics job to process your input data.
- Define the output settings in the Stream Analytics job to partition and write the processed results to the Blob Storage container.
The Azure Blob Storage container will store the output in a structured format. The Stream Analytics job will have an output definition that specifies how the incoming stream is separated into different partitions, typically based on a certain attribute of the data.
Below is a Pulumi program, written in TypeScript, that demonstrates how to set up Blob Storage and a Stream Analytics job with an output that partitions data:
import * as pulumi from "@pulumi/pulumi"; import * as storage from "@pulumi/azure-native/storage"; import * as streamanalytics from "@pulumi/azure-native/streamanalytics"; import * as resources from "@pulumi/azure-native/resources"; // Create an Azure Resource Group const resourceGroup = new resources.ResourceGroup("resourceGroup"); // Create an Azure Storage Account const storageAccount = new storage.StorageAccount("storageaccount", { resourceGroupName: resourceGroup.name, sku: { name: "Standard_LRS", }, kind: "StorageV2", }); // Create a Storage Container to store the output blobs const container = new storage.BlobContainer("blobcontainer", { accountName: storageAccount.name, resourceGroupName: resourceGroup.name, }); // Create an Azure Stream Analytics Job const streamAnalyticsJob = new streamanalytics.Job("streamAnalyticsJob", { resourceGroupName: resourceGroup.name, location: resourceGroup.location, sku: { name: "Standard", }, eventsOutOfOrderPolicy: "Adjust", outputErrorPolicy: "Drop", eventsOutOfOrderMaxDelayInSeconds: 5, eventsLateArrivalMaxDelayInSeconds: 16, }); // Define the output for the Stream Analytics Job to write to the Blob Storage with a partition key const blobOutput = new streamanalytics.Output("blobOutput", { resourceGroupName: resourceGroup.name, jobName: streamAnalyticsJob.name, datasource: { type: "Microsoft.Storage/Blob", storageAccounts: [{ accountKey: storageAccount.primaryAccessKey, accountName: storageAccount.name, }], container: container.name, // Define the path pattern for the blob output pathPattern: "{date}/{time}", dateFormat: "yyyy/MM/dd", timeFormat: "HH", }, serialization: { type: "Csv", properties: { fieldDelimiter: ",", encoding: "UTF8", }, }, }); // Export the primary blob endpoint export const primaryBlobEndpoint = storageAccount.primaryEndpoints.apply(endpoints => endpoints.blob);
In the code above:
- We start by importing required Pulumi libraries for working with resources in Azure.
- We create a resource group to manage related resources collectively.
- A storage account is set up, which we'll use to create a Blob Storage container.
- The Blob Storage container will serve as the destination for output data.
- We define a Stream Analytics job, which is the core processing engine for streaming data.
- The output settings of the Stream Analytics job are specified to write to the Blob Storage container. We use placeholders in
pathPattern
to structure the output blobs based on the date and time. dateFormat
andtimeFormat
are used to define the folder structure that will be applied when writing the output data to blobs.- The serialization format of the output data is set to CSV.
- We export the primary endpoint of the Blob Storage account for further use or reference.
Please note that this Pulumi program assumes you have already set up the required input data sources and the query logic within your Stream Analytics job to perform the streaming data analysis and partitioning. Additionally, you would need to handle authentication and permissions to ensure that the Stream Analytics job has appropriate rights to write to the Blob Storage container.