1. Low-Latency AI Model Serving with Akamai


    To set up a low-latency Artificial Intelligence (AI) model serving environment, there are several factors and components to consider. Akamai is well known for its content delivery network (CDN) services, which can be leveraged to serve AI model predictions at the edge, reducing latency by bringing the computation closer to the users.

    Here's what you need to consider to achieve low-latency AI model serving with Akamai:

    1. Edge Computing: Use Akamai EdgeWorkers to run your AI model inference at the edge. EdgeWorkers allow you to execute JavaScript at edge locations, which can be used to load and run pre-trained AI models.

    2. Caching Strategy: Appropriate caching of the model's predictions can reduce the need for repeated inference if the same request is made multiple times.

    3. Datastream Configuration: Monitoring real-time data of the CDN with Akamai's Datastream will help you understand the performance characteristics and optimize accordingly.

    4. Security: Utilize Akamai's AppSecSecurityPolicy to protect your AI endpoint from potential threats.

    Now, let's create a simplified Pulumi program in Python that demonstrates how you would configure these components. Given that EdgeWorkers would require JavaScript to execute the model inference, we'll focus on setting up the infrastructure and provide links to relevant Akamai documentation for further details on implementation.

    import pulumi import pulumi_akamai as akamai # Define the security configuration for your AI serving endpoint. app_sec_policy = akamai.AppSecSecurityPolicy("aiModelSecPolicy", config_id=pulumi.Config("akamai").require_int("configId"), security_policy_name="AIModelSecurityPolicy", security_policy_prefix="AIMSP", # To create this from an existing policy ID, uncomment and define the appropriate ID # create_from_security_policy_id="an-existing-policy-id" ) # Configure Akamai Datastream to monitor the performance and usage of the AI model serving. data_stream = akamai.Datastream("aiModelDatastream", active=True, group_id="your-group-id", contract_id="your-contract-id", properties=["list-of-properties"], # Define the list of properties you want to monitor stream_name="AIModelServingStream", # Below connector configurations are an example. Replace with the desired connector config. s3_connector={ "path": "/your-logging-path/", "bucket": "your-logging-bucket", "region": "your-bucket-region", "accessKey": "your-access-key", # Treat as a secret value "displayName": "AIModelServingLogging", "compressLogs": True, "secretAccessKey": "your-secret-access-key", # Treat as a secret value }, delivery_configuration={ "format": "JSON", "frequency": { "intervalInSecs": 300, }, }, ) # Deploy an EdgeWorker to host your AI model for inference at the edge. # This process involves writing JavaScript to load and run a pre-trained model on EdgeWorkers. # For details on scripting and deploying EdgeWorkers, refer to Akamai documentation: # https://www.pulumi.com/registry/packages/akamai/api-docs/edgeworker/ edge_worker = akamai.EdgeWorker("aiModelEdgeWorker", name="AIModelInference", resource_tier_id=pulumi.Config("akamai").require_int("resourceTierId"), # The bundle with model and inference code needs to be uploaded via Akamai Control Center. # The "localBundle" field specifies the local path to the worker bundle during development. # local_bundle="path/to/local/edge-worker-bundle.tar.gz" ) # Once the security policy and datastream are in place, and the EdgeWorker has been deployed, # you can direct your application traffic through Akamai to benefit from their global network. # Export relevant information for your resources. pulumi.export("security_policy_id", app_sec_policy.id) pulumi.export("datastream_id", data_stream.id) pulumi.export("edgeworker_id", edge_worker.id)

    To deploy this infrastructure as code with Pulumi:

    • Ensure Pulumi is installed and configured.
    • Replace placeholder values (like "your-group-id", "your-logging-bucket", etc.) with real values relevant to your Akamai setup.
    • Add in the appropriate Akamai configurations, API credentials, and other deployment details to a Pulumi.<stack-name>.yaml file or by using pulumi config commands.
    • The AI model and inference code must be written separately and deployed as an EdgeWorker bundle through Akamai's Control Center.
    • Due to the complexity of serving AI models, it is recommended to engage with Akamai's support or professional services for specific implementation advice and best practices.

    This code sets up the primary infrastructure needed on Akamai’s platform for low-latency serving of AI models. Remember to follow the Akamai documentation while implementing the actual EdgeWorker bundle that will run the AI models.