Global API Acceleration for ML Workloads using CloudFront

Question

Pulumi · Accepted Answer

To set up global API acceleration for ML workloads using AWS CloudFront, we will create a CloudFront distribution that routes requests to the backend where your ML models are hosted. CloudFront provides a global content delivery network (CDN) service that accelerates the delivery of APIs by caching the content closer to the users and by utilizing an optimized network path.

In this program, we will:

1. Define an Amazon S3 bucket to store the ML models (assuming static data or inference code is stored here).
2. Define the CloudFront distribution, which will serve as the front-facing accelerator for your API.
3. Set up the origin, which is the actual location of your ML workloads (e.g., an Elastic Load Balancer in front of your compute resources).
4. Configure cache behavior to fine-tune the caching mechanisms.
5. Configure a security policy to allow HTTPS traffic.
6. Optionally, add a logging configuration to track requests.

Here is the Pulumi program in Python that accomplishes this:

```python
import pulumi
import pulumi_aws as aws

# Create an S3 bucket that will store your ML models or any static content
ml_models_bucket = aws.s3.Bucket("mlModelsBucket")

# Define an origin access identity to restrict direct S3 bucket access
s3_origin_identity = aws.cloudfront.OriginAccessIdentity("s3OriginAccessIdentity")

# Configuring your ML workload's actual endpoint as the origin source
# Replace 'your-backend-endpoint.amazonaws.com' with your actual backend service.
ml_workload_origin = aws.cloudfront.Origin(
    "mlWorkloadOrigin",
    domain_name="your-backend-endpoint.amazonaws.com",
    origin_path="/api", # You might want to adjust this based on your actual API path.
    custom_origin_config=aws.cloudfront.OriginCustomOriginConfigArgs(
        http_port=80,
        https_port=443,
        origin_protocol_policy="https-only",
        origin_ssl_protocols=["TLSv1.2"],
    ),
)

# CloudFront distribution configuration
ml_api_distribution = aws.cloudfront.Distribution("mlApiDistribution",
    enabled=True,
    is_ipv6_enabled=True,
    default_root_object="index.html", # Default object to serve; adjust if necessary.
    origins=[ml_workload_origin],
    default_cache_behavior=aws.cloudfront.DefaultCacheBehaviorArgs(
        allowed_methods=["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"],
        cached_methods=["GET", "HEAD"],
        target_origin_id=ml_workload_origin.id, # Points to the defined origin
        forwarded_values=aws.cloudfront.DefaultCacheBehaviorForwardedValuesArgs(
            query_string=True,
            cookies=aws.cloudfront.DefaultCacheBehaviorForwardedValuesCookiesArgs(forward="none"),
        ),
        viewer_protocol_policy="redirect-to-https",
        min_ttl=0,
        default_ttl=3600,
        max_ttl=86400,
    ),
    viewer_certificate=aws.cloudfront.DistributionViewerCertificateArgs(
        cloudfront_default_certificate=True,
    ),
    restrictions=aws.cloudfront.DistributionRestrictionsArgs(
        geo_restriction=aws.cloudfront.DistributionRestrictionsGeoRestrictionArgs(
            restriction_type="none",
        ),
    ),
    # Optional: Configure logging for requests/responses
    logging_config=aws.cloudfront.DistributionLoggingConfigArgs(
        bucket=ml_models_bucket.bucket_regional_domain_name, # Using the same bucket for simplicity
        include_cookies=False,
        prefix="cf-logs/",
    ),
)

# Output the CloudFront distribution domain name to access your ML workload globally
pulumi.export("cloudfront_distribution_domain", ml_api_distribution.domain_name)
```

This Pulumi program sets up a CloudFront distribution tailored for an API delivering ML workloads.

- We start by creating an Amazon S3 bucket that could host ML models; this is a common pattern if the models are static or inference code is serverless.
- We use an Origin Access Identity to allow CloudFront to access content in the S3 bucket securely.
- The origins list should include your actual backend API endpoint where the ML workload runs, specifying the protocol policy (in this case, HTTPS only for security) and the path to your API.
- The `default_cache_behavior` block specifies what HTTP methods are allowed and cached. It also sets up cache behaviors such as how cookies are handled, query string parameters, and the protocol policy for viewers.
- The `viewer_certificate` block sets up the SSL/TLS certificate. For simplicity, we're using the default CloudFront certificate, but you could also use an ACM certificate.
- The `restrictions` block allows you to restrict access based on geographic locations, which we've left open (set to "none").
- Lastly, logging is optional and can be set up to track access logs. We're directing logs to the same bucket in this example under a specific prefix.

You should replace the placeholder 'your-backend-endpoint.amazonaws.com' with the actual domain name of your backend and tweak any settings to match your specific requirements for caching, headers, and cookies.

Remember, this setup is for demonstration purposes. In a production environment, you'd need to consider additional aspects of security, error handling, and possibly more complex routing or cache invalidation strategies to ensure that your ML workloads are delivered efficiently and securely to end-users globally.