Blob Storage for Datasets in Azure Machine Learning
PythonTo set up Blob Storage for datasets in an Azure Machine Learning workspace, we'll be using the
azure-native.storage.Blob
class from theazure-native
provider. This class will allow us to create a blob within an Azure Storage account, which can then be used for storing datasets for machine learning purposes.Here's a high-level overview of the steps we'll take in our Pulumi program:
- Create an Azure resource group: A resource group is a container that holds related resources for an Azure solution.
- Create an Azure storage account: This is needed to host the blob container.
- Create an Azure storage container: This will be the container within the storage account where our blobs are stored.
- Create a blob within the storage container: This is where the actual dataset file(s) will be uploaded.
Now let's walk through the code that performs these steps:
import pulumi import pulumi_azure_native as azure_native # Create an Azure Resource Group resource_group = azure_native.resources.ResourceGroup("my-resource-group") # Create an Azure Storage Account storage_account = azure_native.storage.StorageAccount("mystorageaccount", resource_group_name=resource_group.name, kind="StorageV2", sku=azure_native.storage.SkuArgs( name="Standard_LRS", ), location=resource_group.location, ) # Create a Storage Container inside the Azure Storage Account container = azure_native.storage.BlobContainer("mycontainer", account_name=storage_account.name, resource_group_name=resource_group.name, public_access="None", ) # Create a Blob within the Storage Container blob = azure_native.storage.Blob("mydatasetblob", account_name=storage_account.name, container_name=container.name, resource_group_name=resource_group.name, source=pulumi.FileAsset("path/to/dataset.csv"), # Replace with path to your dataset file ) # Export the URL of the blob to access the data blob_url = pulumi.Output.concat( "https://", storage_account.name, ".blob.core.windows.net/", container.name, "/", blob.name, ) pulumi.export("blob_url", blob_url)
In this program, we first declare the necessary imports. We then create a resource group and a storage account with the desired configuration. After the storage account, we instantiate a blob container and then upload a dataset (which in this case, we've assumed is a
.csv
file located on your local machine) as a blob to Azure Blob Storage.The
pulumi.FileAsset
class is used to reference a file on your local filesystem, and when the Blob object is created, the file is uploaded to Azure Blob Storage. Finally, we generate and export the URL that can be used to access the uploaded dataset directly.Replace
"path/to/dataset.csv"
with the actual file path to your dataset that you wish to upload. After running this program with Pulumi, the dataset file will be uploaded to Azure Blob Storage, and you'll get an output that shows the blob URL. This URL can be used in your Azure Machine Learning workspace or elsewhere as needed to access the dataset.To run this Pulumi program, save the code to a file with a
.py
extension, install the required Pulumi Azure Native package using python's package managerpip install pulumi-azure-native
, and then execute it usingpulumi up
. Make sure you have the appropriate Azure credentials configured on your machine to allow Pulumi to interact with your Azure account.