1. Blob Storage for Training Datasets in Azure ML


    To create Blob Storage for storing training datasets in Azure Machine Learning, we need to provision several resources within Azure:

    1. Resource Group: A container that holds related resources for an Azure solution.
    2. Storage Account: Azure Storage provides scalable cloud storage for objects, file data, and other types of data. For machine learning datasets, we generally use Blob Storage provided by Azure Storage accounts.
    3. Blob Container: This is a specific storage space within the Storage Account where blobs (or datasets in this case) are stored.
    4. Machine Learning Workspace: A foundational resource in the cloud that you use to experiment, train, and deploy machine learning models with Azure Machine Learning.

    Below is a Python program using the Pulumi SDK that sets up these resources in Azure.

    import pulumi import pulumi_azure_native as azure_native # Create a resource group for all the resources resource_group = azure_native.resources.ResourceGroup('training_datasets_rg') # Create an Azure Storage Account for Blob Storage storage_account = azure_native.storage.StorageAccount('trainingdatasetsstorage', resource_group_name=resource_group.name, sku=azure_native.storage.SkuArgs(name=azure_native.storage.SkuName.STANDARD_LRS), kind=azure_native.storage.Kind.STORAGE_V2) # Create a Blob Storage Container within the Storage Account to house training datasets blob_container = azure_native.storage.BlobContainer('trainingdatasetscontainer', resource_group_name=resource_group.name, account_name=storage_account.name) # Create an Azure Machine Learning Workspace ml_workspace = azure_native.machinelearningservices.Workspace('mlworkspace', resource_group_name=resource_group.name, sku=azure_native.machinelearningservices.SkuArgs(name="Basic"), # Replace `location` with your desired region for the machine learning workspace location='eastus') # Register the storage account as an Azure Machine Learning datastore # The 'default' datastore is a built-in datastore that uses the account associated with your workspace # Additional datastores can be connected and used for training data storage datastore = azure_native.machinelearningservices.Datastore('trainingdatastore', name="trainingdatastore", resource_group_name=resource_group.name, workspace_name=ml_workspace.name, datastore_properties=azure_native.machinelearningservices.DatastorePropertiesResourceRequestArgs( datastore_type="AzureBlob", account_name=storage_account.name, container_name=blob_container.name)) # Export the required outputs that you might need to use elsewhere pulumi.export('resource_group_name', resource_group.name) pulumi.export('storage_account_name', storage_account.name) pulumi.export('blob_container_name', blob_container.name) pulumi.export('workspace_name', ml_workspace.name)


    • ResourceGroup: We create a new resource group named training_datasets_rg to manage all our Azure resources logically.

    • StorageAccount: This resource represents the Azure Storage account with STANDARD_LRS SKU, which indicates it's using standard locally-redundant storage.

    • BlobContainer: Within our storage account, we create a single Blob Container where our datasets will reside.

      For the arguments under BlobContainer, we reference resource_group.name and storage_account.name to indicate the association with the previously created resources.

    • Workspace: We set up an Azure Machine Learning Workspace. This will be our central place for managing and orchestrating machine learning activities. The Basic SKU is sufficient for training purposes.

    • Datastore: This is an abstraction over the Blob Storage which makes it easier to manage and reference our datasets within Azure Machine Learning Workspace.

    • pulumi.export: These statements expose the names and identifiers of our resources as stack outputs. These values can be used in your CI/CD system or accessed by other Pulumi stacks.

    Remember to replace 'eastus' with the Azure region you intend to use. The region should support Azure Machine Learning services.

    To run this program:

    1. Ensure you have Pulumi installed and configured with your Azure credentials.
    2. Save this code to a file, for instance, __main__.py.
    3. Run pulumi up in the terminal in the directory of the file to preview and deploy the resources.

    Upon successful deployment, the pulumi.export statements will print the outputs to your terminal, which are also accessible from the Pulumi Console.