1. Maintaining ML Model Catalog in AWS DynamoDB


    To maintain a machine learning (ML) model catalog in AWS DynamoDB using Pulumi, we'll perform the following steps:

    1. Create a DynamoDB table to store the ML model metadata. The table will need attributes that define the model's unique identifier and additional attributes that describe the model.
    2. Optionally set up a backup mechanism or enable point-in-time recovery if required. While the registry search didn't provide a direct result for that, we can manually enable these features for our DynamoDB table.
    3. Define a secondary index if needed, for efficient querying based on non-primary key attributes.

    For the sake of this example, we'll assume our ML model catalog items have the following attributes: ModelId (partition key), ModelVersion (sort key), CreationDate, ModelType, and ModelDescription. This structure is minimal and would be expanded based on specific needs.

    Here's how to create such a table using Pulumi and AWS SDK in Python:

    import pulumi import pulumi_aws as aws # Define the DynamoDB table for the ML model catalog. model_catalog_table = aws.dynamodb.Table("modelCatalogTable", attributes=[ # Define the primary key as a composite of ModelId and ModelVersion aws.dynamodb.TableAttributeArgs( name="ModelId", type="S", # 'S' stands for String, which is suitable for a unique model identifier. ), aws.dynamodb.TableAttributeArgs( name="ModelVersion", type="S", # 'S' stands for String, suitable for version identifiers. ) ], hash_key="ModelId", # Partition key range_key="ModelVersion", # Sort key billing_mode="PAY_PER_REQUEST", # Use on-demand pricing (no need to specify read/write capacity units). stream_enabled=True, stream_view_type="NEW_AND_OLD_IMAGES", # Stream view type to capture new and old images of items. ttl=aws.dynamodb.TableTtlArgs( attribute_name="TimeToLive", # Attribute to define the TTL. You need to include this in your item definition to use TTL. enabled=True ), point_in_time_recovery=aws.dynamodb.TablePointInTimeRecoveryArgs( enabled=True # Enable point-in-time recovery to protect against accidental writes or deletes. ), tags={ "Environment": "production", # Tag your resources for organizational purposes. "Purpose": "ML Model Catalog" } ) # Export the name of the table pulumi.export("model_catalog_table_name", model_catalog_table.name)

    This Pulumi program will set up a DynamoDB table named modelCatalogTable:

    • Two attributes are defined: ModelId and ModelVersion to uniquely identify each item (ML model) in the table. You can use ModelId to store a unique name or identifier for each ML model and ModelVersion to store different versions of the same model.

    • The billing_mode is set to PAY_PER_REQUEST, which means you'll only pay for the read/write throughput that you use, without provisioning in advance. This is beneficial for workloads that are difficult to predict and is often cost-effective for tables with sporadic traffic.

    • Streams (stream_enabled) are activated and configured to capture both new and old images of item updates. This feature can be used to trigger AWS Lambda functions for real-time processing of table data changes.

    • Time-to-live (TTL) and point-in-time recovery are enabled. TTL can help automatically expire older data after a certain time, reducing storage costs and helping maintain data freshness. Point-in-time recovery is critical for safeguarding your data against accidental writes or deletes.

    • Tags are added for better resource organization and possibly for cost tracking.

    After running this program with Pulumi, the modelCatalogTable DynamoDB table will be created and ready for you to insert your ML model metadata. You can then use standard AWS SDKs or the AWS Management Console to manage your ML model catalog in this table.