AWS Cloud Control v1.30.0, Jun 16 25

We recommend new projects start with resources from the AWS provider.

AWS Cloud Control v1.30.0 published on Monday, Jun 16, 2025 by Pulumi

pulumi/pulumi-aws-native

aws-native.sagemaker.InferenceComponent

Explore with Pulumi AI

We recommend new projects start with resources from the AWS provider.

AWS Cloud Control v1.30.0 published on Monday, Jun 16, 2025 by Pulumi

pulumi/pulumi-aws-native

Create InferenceComponent Resource

Resources are created with functions called constructors. To learn more about declaring and configuring resources, see Resources.

Constructor syntax

new InferenceComponent(name: string, args: InferenceComponentArgs, opts?: CustomResourceOptions);

@overload
def InferenceComponent(resource_name: str,
                       args: InferenceComponentArgs,
                       opts: Optional[ResourceOptions] = None)

@overload
def InferenceComponent(resource_name: str,
                       opts: Optional[ResourceOptions] = None,
                       endpoint_name: Optional[str] = None,
                       specification: Optional[InferenceComponentSpecificationArgs] = None,
                       deployment_config: Optional[InferenceComponentDeploymentConfigArgs] = None,
                       endpoint_arn: Optional[str] = None,
                       inference_component_name: Optional[str] = None,
                       runtime_config: Optional[InferenceComponentRuntimeConfigArgs] = None,
                       tags: Optional[Sequence[_root_inputs.TagArgs]] = None,
                       variant_name: Optional[str] = None)

func NewInferenceComponent(ctx *Context, name string, args InferenceComponentArgs, opts ...ResourceOption) (*InferenceComponent, error)

public InferenceComponent(string name, InferenceComponentArgs args, CustomResourceOptions? opts = null)

public InferenceComponent(String name, InferenceComponentArgs args)
public InferenceComponent(String name, InferenceComponentArgs args, CustomResourceOptions options)

type: aws-native:sagemaker:InferenceComponent
properties: # The arguments to resource properties.
options: # Bag of options to control resource's behavior.

Parameters

name string: The unique name of the resource.
args InferenceComponentArgs: The arguments to resource properties.
opts CustomResourceOptions: Bag of options to control resource's behavior.

resource_name str: The unique name of the resource.
args InferenceComponentArgs: The arguments to resource properties.
opts ResourceOptions: Bag of options to control resource's behavior.

ctx Context: Context object for the current deployment.
name string: The unique name of the resource.
args InferenceComponentArgs: The arguments to resource properties.
opts ResourceOption: Bag of options to control resource's behavior.

name string: The unique name of the resource.
args InferenceComponentArgs: The arguments to resource properties.
opts CustomResourceOptions: Bag of options to control resource's behavior.

name String: The unique name of the resource.
args InferenceComponentArgs: The arguments to resource properties.
options CustomResourceOptions: Bag of options to control resource's behavior.

InferenceComponent Resource Properties

To learn more about resource properties and how to use them, see Inputs and Outputs in the Architecture and Concepts docs.

Inputs

In Python, inputs that are objects can be passed either as argument classes or as dictionary literals.

The InferenceComponent resource accepts the following input properties:

EndpointName string: The name of the endpoint that hosts the inference component.
Specification Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentSpecification
DeploymentConfig Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentDeploymentConfig: The deployment configuration for an endpoint, which contains the desired deployment strategy and rollback configurations.
EndpointArn string: The Amazon Resource Name (ARN) of the endpoint that hosts the inference component.
InferenceComponentName string: The name of the inference component.
RuntimeConfig Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentRuntimeConfig
Tags List<Pulumi.AwsNative.Inputs.Tag>
VariantName string: The name of the production variant that hosts the inference component.

EndpointName string: The name of the endpoint that hosts the inference component.
Specification InferenceComponentSpecificationArgs
DeploymentConfig InferenceComponentDeploymentConfigArgs: The deployment configuration for an endpoint, which contains the desired deployment strategy and rollback configurations.
EndpointArn string: The Amazon Resource Name (ARN) of the endpoint that hosts the inference component.
InferenceComponentName string: The name of the inference component.
RuntimeConfig InferenceComponentRuntimeConfigArgs
Tags TagArgs
VariantName string: The name of the production variant that hosts the inference component.

endpointName String: The name of the endpoint that hosts the inference component.
specification InferenceComponentSpecification
deploymentConfig InferenceComponentDeploymentConfig: The deployment configuration for an endpoint, which contains the desired deployment strategy and rollback configurations.
endpointArn String: The Amazon Resource Name (ARN) of the endpoint that hosts the inference component.
inferenceComponentName String: The name of the inference component.
runtimeConfig InferenceComponentRuntimeConfig
tags List<Tag>
variantName String: The name of the production variant that hosts the inference component.

endpointName string: The name of the endpoint that hosts the inference component.
specification InferenceComponentSpecification
deploymentConfig InferenceComponentDeploymentConfig: The deployment configuration for an endpoint, which contains the desired deployment strategy and rollback configurations.
endpointArn string: The Amazon Resource Name (ARN) of the endpoint that hosts the inference component.
inferenceComponentName string: The name of the inference component.
runtimeConfig InferenceComponentRuntimeConfig
tags Tag[]
variantName string: The name of the production variant that hosts the inference component.

endpoint_name str: The name of the endpoint that hosts the inference component.
specification InferenceComponentSpecificationArgs
deployment_config InferenceComponentDeploymentConfigArgs: The deployment configuration for an endpoint, which contains the desired deployment strategy and rollback configurations.
endpoint_arn str: The Amazon Resource Name (ARN) of the endpoint that hosts the inference component.
inference_component_name str: The name of the inference component.
runtime_config InferenceComponentRuntimeConfigArgs
tags Sequence[TagArgs]
variant_name str: The name of the production variant that hosts the inference component.

endpointName String: The name of the endpoint that hosts the inference component.
specification Property Map
deploymentConfig Property Map: The deployment configuration for an endpoint, which contains the desired deployment strategy and rollback configurations.
endpointArn String: The Amazon Resource Name (ARN) of the endpoint that hosts the inference component.
inferenceComponentName String: The name of the inference component.
runtimeConfig Property Map
tags List<Property Map>
variantName String: The name of the production variant that hosts the inference component.

Outputs

All input properties are implicitly available as output properties. Additionally, the InferenceComponent resource produces the following output properties:

CreationTime string: The time when the inference component was created.
FailureReason string
Id string: The provider-assigned unique ID for this managed resource.
InferenceComponentArn string: The Amazon Resource Name (ARN) of the inference component.
InferenceComponentStatus Pulumi.AwsNative.SageMaker.InferenceComponentStatus: The status of the inference component.
LastModifiedTime string: The time when the inference component was last updated.

CreationTime string: The time when the inference component was created.
FailureReason string
Id string: The provider-assigned unique ID for this managed resource.
InferenceComponentArn string: The Amazon Resource Name (ARN) of the inference component.
InferenceComponentStatus InferenceComponentStatus: The status of the inference component.
LastModifiedTime string: The time when the inference component was last updated.

creationTime String: The time when the inference component was created.
failureReason String
id String: The provider-assigned unique ID for this managed resource.
inferenceComponentArn String: The Amazon Resource Name (ARN) of the inference component.
inferenceComponentStatus InferenceComponentStatus: The status of the inference component.
lastModifiedTime String: The time when the inference component was last updated.

creationTime string: The time when the inference component was created.
failureReason string
id string: The provider-assigned unique ID for this managed resource.
inferenceComponentArn string: The Amazon Resource Name (ARN) of the inference component.
inferenceComponentStatus InferenceComponentStatus: The status of the inference component.
lastModifiedTime string: The time when the inference component was last updated.

creation_time str: The time when the inference component was created.
failure_reason str
id str: The provider-assigned unique ID for this managed resource.
inference_component_arn str: The Amazon Resource Name (ARN) of the inference component.
inference_component_status InferenceComponentStatus: The status of the inference component.
last_modified_time str: The time when the inference component was last updated.

creationTime String: The time when the inference component was created.
failureReason String
id String: The provider-assigned unique ID for this managed resource.
inferenceComponentArn String: The Amazon Resource Name (ARN) of the inference component.
inferenceComponentStatus "InService" | "Creating" | "Updating" | "Failed" | "Deleting": The status of the inference component.
lastModifiedTime String: The time when the inference component was last updated.

Supporting Types

InferenceComponentAlarm, InferenceComponentAlarmArgs

AlarmName string: The name of a CloudWatch alarm in your account.

AlarmName string: The name of a CloudWatch alarm in your account.

alarmName String: The name of a CloudWatch alarm in your account.

alarmName string: The name of a CloudWatch alarm in your account.

alarm_name str: The name of a CloudWatch alarm in your account.

alarmName String: The name of a CloudWatch alarm in your account.

InferenceComponentAutoRollbackConfiguration, InferenceComponentAutoRollbackConfigurationArgs

Alarms List<Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentAlarm>

Alarms []InferenceComponentAlarm

alarms List<InferenceComponentAlarm>

alarms InferenceComponentAlarm[]

alarms Sequence[InferenceComponentAlarm]

alarms List<Property Map>

InferenceComponentCapacitySize, InferenceComponentCapacitySizeArgs

Type Pulumi.AwsNative.SageMaker.InferenceComponentCapacitySizeType

Specifies the endpoint capacity type.

COPY_COUNT - The endpoint activates based on the number of inference component copies.
CAPACITY_PERCENT - The endpoint activates based on the specified percentage of capacity.

Value int

Defines the capacity size, either as a number of inference component copies or a capacity percentage.

Type InferenceComponentCapacitySizeType

Specifies the endpoint capacity type.

COPY_COUNT - The endpoint activates based on the number of inference component copies.
CAPACITY_PERCENT - The endpoint activates based on the specified percentage of capacity.

Value int

Defines the capacity size, either as a number of inference component copies or a capacity percentage.

type InferenceComponentCapacitySizeType

Specifies the endpoint capacity type.

COPY_COUNT - The endpoint activates based on the number of inference component copies.
CAPACITY_PERCENT - The endpoint activates based on the specified percentage of capacity.

value Integer

Defines the capacity size, either as a number of inference component copies or a capacity percentage.

type InferenceComponentCapacitySizeType

Specifies the endpoint capacity type.

COPY_COUNT - The endpoint activates based on the number of inference component copies.
CAPACITY_PERCENT - The endpoint activates based on the specified percentage of capacity.

value number

Defines the capacity size, either as a number of inference component copies or a capacity percentage.

type InferenceComponentCapacitySizeType

Specifies the endpoint capacity type.

COPY_COUNT - The endpoint activates based on the number of inference component copies.
CAPACITY_PERCENT - The endpoint activates based on the specified percentage of capacity.

value int

Defines the capacity size, either as a number of inference component copies or a capacity percentage.

type "COPY_COUNT" | "CAPACITY_PERCENT"

Specifies the endpoint capacity type.

COPY_COUNT - The endpoint activates based on the number of inference component copies.
CAPACITY_PERCENT - The endpoint activates based on the specified percentage of capacity.

value Number

Defines the capacity size, either as a number of inference component copies or a capacity percentage.

InferenceComponentCapacitySizeType, InferenceComponentCapacitySizeTypeArgs

CopyCount: COPY_COUNT
CapacityPercent: CAPACITY_PERCENT

InferenceComponentCapacitySizeTypeCopyCount: COPY_COUNT
InferenceComponentCapacitySizeTypeCapacityPercent: CAPACITY_PERCENT

CopyCount: COPY_COUNT
CapacityPercent: CAPACITY_PERCENT

CopyCount: COPY_COUNT
CapacityPercent: CAPACITY_PERCENT

COPY_COUNT: COPY_COUNT
CAPACITY_PERCENT: CAPACITY_PERCENT

"COPY_COUNT": COPY_COUNT
"CAPACITY_PERCENT": CAPACITY_PERCENT

InferenceComponentComputeResourceRequirements, InferenceComponentComputeResourceRequirementsArgs

MaxMemoryRequiredInMb int: The maximum MB of memory to allocate to run a model that you assign to an inference component.
MinMemoryRequiredInMb int: The minimum MB of memory to allocate to run a model that you assign to an inference component.
NumberOfAcceleratorDevicesRequired double: The number of accelerators to allocate to run a model that you assign to an inference component. Accelerators include GPUs and AWS Inferentia.
NumberOfCpuCoresRequired double: The number of CPU cores to allocate to run a model that you assign to an inference component.

MaxMemoryRequiredInMb int: The maximum MB of memory to allocate to run a model that you assign to an inference component.
MinMemoryRequiredInMb int: The minimum MB of memory to allocate to run a model that you assign to an inference component.
NumberOfAcceleratorDevicesRequired float64: The number of accelerators to allocate to run a model that you assign to an inference component. Accelerators include GPUs and AWS Inferentia.
NumberOfCpuCoresRequired float64: The number of CPU cores to allocate to run a model that you assign to an inference component.

maxMemoryRequiredInMb Integer: The maximum MB of memory to allocate to run a model that you assign to an inference component.
minMemoryRequiredInMb Integer: The minimum MB of memory to allocate to run a model that you assign to an inference component.
numberOfAcceleratorDevicesRequired Double: The number of accelerators to allocate to run a model that you assign to an inference component. Accelerators include GPUs and AWS Inferentia.
numberOfCpuCoresRequired Double: The number of CPU cores to allocate to run a model that you assign to an inference component.

maxMemoryRequiredInMb number: The maximum MB of memory to allocate to run a model that you assign to an inference component.
minMemoryRequiredInMb number: The minimum MB of memory to allocate to run a model that you assign to an inference component.
numberOfAcceleratorDevicesRequired number: The number of accelerators to allocate to run a model that you assign to an inference component. Accelerators include GPUs and AWS Inferentia.
numberOfCpuCoresRequired number: The number of CPU cores to allocate to run a model that you assign to an inference component.

max_memory_required_in_mb int: The maximum MB of memory to allocate to run a model that you assign to an inference component.
min_memory_required_in_mb int: The minimum MB of memory to allocate to run a model that you assign to an inference component.
number_of_accelerator_devices_required float: The number of accelerators to allocate to run a model that you assign to an inference component. Accelerators include GPUs and AWS Inferentia.
number_of_cpu_cores_required float: The number of CPU cores to allocate to run a model that you assign to an inference component.

maxMemoryRequiredInMb Number: The maximum MB of memory to allocate to run a model that you assign to an inference component.
minMemoryRequiredInMb Number: The minimum MB of memory to allocate to run a model that you assign to an inference component.
numberOfAcceleratorDevicesRequired Number: The number of accelerators to allocate to run a model that you assign to an inference component. Accelerators include GPUs and AWS Inferentia.
numberOfCpuCoresRequired Number: The number of CPU cores to allocate to run a model that you assign to an inference component.

InferenceComponentContainerSpecification, InferenceComponentContainerSpecificationArgs

ArtifactUrl string: The Amazon S3 path where the model artifacts, which result from model training, are stored. This path must point to a single gzip compressed tar archive (.tar.gz suffix).
DeployedImage Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentDeployedImage
Environment Dictionary<string, string>: The environment variables to set in the Docker container. Each key and value in the Environment string-to-string map can have length of up to 1024. We support up to 16 entries in the map.
Image string: The Amazon Elastic Container Registry (Amazon ECR) path where the Docker image for the model is stored.

ArtifactUrl string: The Amazon S3 path where the model artifacts, which result from model training, are stored. This path must point to a single gzip compressed tar archive (.tar.gz suffix).
DeployedImage InferenceComponentDeployedImage
Environment map[string]string: The environment variables to set in the Docker container. Each key and value in the Environment string-to-string map can have length of up to 1024. We support up to 16 entries in the map.
Image string: The Amazon Elastic Container Registry (Amazon ECR) path where the Docker image for the model is stored.

artifactUrl String: The Amazon S3 path where the model artifacts, which result from model training, are stored. This path must point to a single gzip compressed tar archive (.tar.gz suffix).
deployedImage InferenceComponentDeployedImage
environment Map<String,String>: The environment variables to set in the Docker container. Each key and value in the Environment string-to-string map can have length of up to 1024. We support up to 16 entries in the map.
image String: The Amazon Elastic Container Registry (Amazon ECR) path where the Docker image for the model is stored.

artifactUrl string: The Amazon S3 path where the model artifacts, which result from model training, are stored. This path must point to a single gzip compressed tar archive (.tar.gz suffix).
deployedImage InferenceComponentDeployedImage
environment {[key: string]: string}: The environment variables to set in the Docker container. Each key and value in the Environment string-to-string map can have length of up to 1024. We support up to 16 entries in the map.
image string: The Amazon Elastic Container Registry (Amazon ECR) path where the Docker image for the model is stored.

artifact_url str: The Amazon S3 path where the model artifacts, which result from model training, are stored. This path must point to a single gzip compressed tar archive (.tar.gz suffix).
deployed_image InferenceComponentDeployedImage
environment Mapping[str, str]: The environment variables to set in the Docker container. Each key and value in the Environment string-to-string map can have length of up to 1024. We support up to 16 entries in the map.
image str: The Amazon Elastic Container Registry (Amazon ECR) path where the Docker image for the model is stored.

artifactUrl String: The Amazon S3 path where the model artifacts, which result from model training, are stored. This path must point to a single gzip compressed tar archive (.tar.gz suffix).
deployedImage Property Map
environment Map<String>: The environment variables to set in the Docker container. Each key and value in the Environment string-to-string map can have length of up to 1024. We support up to 16 entries in the map.
image String: The Amazon Elastic Container Registry (Amazon ECR) path where the Docker image for the model is stored.

InferenceComponentDeployedImage, InferenceComponentDeployedImageArgs

ResolutionTime string: The date and time when the image path for the model resolved to the ResolvedImage
ResolvedImage string: The specific digest path of the image hosted in this ProductionVariant .
SpecifiedImage string: The image path you specified when you created the model.

ResolutionTime string: The date and time when the image path for the model resolved to the ResolvedImage
ResolvedImage string: The specific digest path of the image hosted in this ProductionVariant .
SpecifiedImage string: The image path you specified when you created the model.

resolutionTime String: The date and time when the image path for the model resolved to the ResolvedImage
resolvedImage String: The specific digest path of the image hosted in this ProductionVariant .
specifiedImage String: The image path you specified when you created the model.

resolutionTime string: The date and time when the image path for the model resolved to the ResolvedImage
resolvedImage string: The specific digest path of the image hosted in this ProductionVariant .
specifiedImage string: The image path you specified when you created the model.

resolution_time str: The date and time when the image path for the model resolved to the ResolvedImage
resolved_image str: The specific digest path of the image hosted in this ProductionVariant .
specified_image str: The image path you specified when you created the model.

resolutionTime String: The date and time when the image path for the model resolved to the ResolvedImage
resolvedImage String: The specific digest path of the image hosted in this ProductionVariant .
specifiedImage String: The image path you specified when you created the model.

InferenceComponentDeploymentConfig, InferenceComponentDeploymentConfigArgs

AutoRollbackConfiguration Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentAutoRollbackConfiguration
RollingUpdatePolicy Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentRollingUpdatePolicy: Specifies a rolling deployment strategy for updating a SageMaker AI endpoint.

AutoRollbackConfiguration InferenceComponentAutoRollbackConfiguration
RollingUpdatePolicy InferenceComponentRollingUpdatePolicy: Specifies a rolling deployment strategy for updating a SageMaker AI endpoint.

autoRollbackConfiguration InferenceComponentAutoRollbackConfiguration
rollingUpdatePolicy InferenceComponentRollingUpdatePolicy: Specifies a rolling deployment strategy for updating a SageMaker AI endpoint.

autoRollbackConfiguration InferenceComponentAutoRollbackConfiguration
rollingUpdatePolicy InferenceComponentRollingUpdatePolicy: Specifies a rolling deployment strategy for updating a SageMaker AI endpoint.

auto_rollback_configuration InferenceComponentAutoRollbackConfiguration
rolling_update_policy InferenceComponentRollingUpdatePolicy: Specifies a rolling deployment strategy for updating a SageMaker AI endpoint.

autoRollbackConfiguration Property Map
rollingUpdatePolicy Property Map: Specifies a rolling deployment strategy for updating a SageMaker AI endpoint.

InferenceComponentRollingUpdatePolicy, InferenceComponentRollingUpdatePolicyArgs

MaximumBatchSize Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentCapacitySize: The batch size for each rolling step in the deployment process. For each step, SageMaker AI provisions capacity on the new endpoint fleet, routes traffic to that fleet, and terminates capacity on the old endpoint fleet. The value must be between 5% to 50% of the copy count of the inference component.
MaximumExecutionTimeoutInSeconds int: The time limit for the total deployment. Exceeding this limit causes a timeout.
RollbackMaximumBatchSize Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentCapacitySize: The batch size for a rollback to the old endpoint fleet. If this field is absent, the value is set to the default, which is 100% of the total capacity. When the default is used, SageMaker AI provisions the entire capacity of the old fleet at once during rollback.
WaitIntervalInSeconds int: The length of the baking period, during which SageMaker AI monitors alarms for each batch on the new fleet.

MaximumBatchSize InferenceComponentCapacitySize: The batch size for each rolling step in the deployment process. For each step, SageMaker AI provisions capacity on the new endpoint fleet, routes traffic to that fleet, and terminates capacity on the old endpoint fleet. The value must be between 5% to 50% of the copy count of the inference component.
MaximumExecutionTimeoutInSeconds int: The time limit for the total deployment. Exceeding this limit causes a timeout.
RollbackMaximumBatchSize InferenceComponentCapacitySize: The batch size for a rollback to the old endpoint fleet. If this field is absent, the value is set to the default, which is 100% of the total capacity. When the default is used, SageMaker AI provisions the entire capacity of the old fleet at once during rollback.
WaitIntervalInSeconds int: The length of the baking period, during which SageMaker AI monitors alarms for each batch on the new fleet.

maximumBatchSize InferenceComponentCapacitySize: The batch size for each rolling step in the deployment process. For each step, SageMaker AI provisions capacity on the new endpoint fleet, routes traffic to that fleet, and terminates capacity on the old endpoint fleet. The value must be between 5% to 50% of the copy count of the inference component.
maximumExecutionTimeoutInSeconds Integer: The time limit for the total deployment. Exceeding this limit causes a timeout.
rollbackMaximumBatchSize InferenceComponentCapacitySize: The batch size for a rollback to the old endpoint fleet. If this field is absent, the value is set to the default, which is 100% of the total capacity. When the default is used, SageMaker AI provisions the entire capacity of the old fleet at once during rollback.
waitIntervalInSeconds Integer: The length of the baking period, during which SageMaker AI monitors alarms for each batch on the new fleet.

maximumBatchSize InferenceComponentCapacitySize: The batch size for each rolling step in the deployment process. For each step, SageMaker AI provisions capacity on the new endpoint fleet, routes traffic to that fleet, and terminates capacity on the old endpoint fleet. The value must be between 5% to 50% of the copy count of the inference component.
maximumExecutionTimeoutInSeconds number: The time limit for the total deployment. Exceeding this limit causes a timeout.
rollbackMaximumBatchSize InferenceComponentCapacitySize: The batch size for a rollback to the old endpoint fleet. If this field is absent, the value is set to the default, which is 100% of the total capacity. When the default is used, SageMaker AI provisions the entire capacity of the old fleet at once during rollback.
waitIntervalInSeconds number: The length of the baking period, during which SageMaker AI monitors alarms for each batch on the new fleet.

maximum_batch_size InferenceComponentCapacitySize: The batch size for each rolling step in the deployment process. For each step, SageMaker AI provisions capacity on the new endpoint fleet, routes traffic to that fleet, and terminates capacity on the old endpoint fleet. The value must be between 5% to 50% of the copy count of the inference component.
maximum_execution_timeout_in_seconds int: The time limit for the total deployment. Exceeding this limit causes a timeout.
rollback_maximum_batch_size InferenceComponentCapacitySize: The batch size for a rollback to the old endpoint fleet. If this field is absent, the value is set to the default, which is 100% of the total capacity. When the default is used, SageMaker AI provisions the entire capacity of the old fleet at once during rollback.
wait_interval_in_seconds int: The length of the baking period, during which SageMaker AI monitors alarms for each batch on the new fleet.

maximumBatchSize Property Map: The batch size for each rolling step in the deployment process. For each step, SageMaker AI provisions capacity on the new endpoint fleet, routes traffic to that fleet, and terminates capacity on the old endpoint fleet. The value must be between 5% to 50% of the copy count of the inference component.
maximumExecutionTimeoutInSeconds Number: The time limit for the total deployment. Exceeding this limit causes a timeout.
rollbackMaximumBatchSize Property Map: The batch size for a rollback to the old endpoint fleet. If this field is absent, the value is set to the default, which is 100% of the total capacity. When the default is used, SageMaker AI provisions the entire capacity of the old fleet at once during rollback.
waitIntervalInSeconds Number: The length of the baking period, during which SageMaker AI monitors alarms for each batch on the new fleet.

InferenceComponentRuntimeConfig, InferenceComponentRuntimeConfigArgs

CopyCount int: The number of runtime copies of the model container to deploy with the inference component. Each copy can serve inference requests.
CurrentCopyCount int
DesiredCopyCount int

CopyCount int: The number of runtime copies of the model container to deploy with the inference component. Each copy can serve inference requests.
CurrentCopyCount int
DesiredCopyCount int

copyCount Integer: The number of runtime copies of the model container to deploy with the inference component. Each copy can serve inference requests.
currentCopyCount Integer
desiredCopyCount Integer

copyCount number: The number of runtime copies of the model container to deploy with the inference component. Each copy can serve inference requests.
currentCopyCount number
desiredCopyCount number

copy_count int: The number of runtime copies of the model container to deploy with the inference component. Each copy can serve inference requests.
current_copy_count int
desired_copy_count int

copyCount Number: The number of runtime copies of the model container to deploy with the inference component. Each copy can serve inference requests.
currentCopyCount Number
desiredCopyCount Number

InferenceComponentSpecification, InferenceComponentSpecificationArgs

BaseInferenceComponentName string

The name of an existing inference component that is to contain the inference component that you're creating with your request.

Specify this parameter only if your request is meant to create an adapter inference component. An adapter inference component contains the path to an adapter model. The purpose of the adapter model is to tailor the inference output of a base foundation model, which is hosted by the base inference component. The adapter inference component uses the compute resources that you assigned to the base inference component.

When you create an adapter inference component, use the Container parameter to specify the location of the adapter artifacts. In the parameter value, use the ArtifactUrl parameter of the InferenceComponentContainerSpecification data type.

Before you can create an adapter inference component, you must have an existing inference component that contains the foundation model that you want to adapt.

ComputeResourceRequirements Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentComputeResourceRequirements

The compute resources allocated to run the model, plus any adapter models, that you assign to the inference component.

Omit this parameter if your request is meant to create an adapter inference component. An adapter inference component is loaded by a base inference component, and it uses the compute resources of the base inference component.

Container Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentContainerSpecification

Defines a container that provides the runtime environment for a model that you deploy with an inference component.

ModelName string

The name of an existing SageMaker AI model object in your account that you want to deploy with the inference component.

StartupParameters Pulumi.AwsNative.SageMaker.Inputs.InferenceComponentStartupParameters

Settings that take effect while the model container starts up.

BaseInferenceComponentName string

The name of an existing inference component that is to contain the inference component that you're creating with your request.

Before you can create an adapter inference component, you must have an existing inference component that contains the foundation model that you want to adapt.

ComputeResourceRequirements InferenceComponentComputeResourceRequirements

The compute resources allocated to run the model, plus any adapter models, that you assign to the inference component.

Container InferenceComponentContainerSpecification

Defines a container that provides the runtime environment for a model that you deploy with an inference component.

ModelName string

The name of an existing SageMaker AI model object in your account that you want to deploy with the inference component.

StartupParameters InferenceComponentStartupParameters

Settings that take effect while the model container starts up.

baseInferenceComponentName String

The name of an existing inference component that is to contain the inference component that you're creating with your request.

Before you can create an adapter inference component, you must have an existing inference component that contains the foundation model that you want to adapt.

computeResourceRequirements InferenceComponentComputeResourceRequirements

The compute resources allocated to run the model, plus any adapter models, that you assign to the inference component.

container InferenceComponentContainerSpecification

Defines a container that provides the runtime environment for a model that you deploy with an inference component.

modelName String

The name of an existing SageMaker AI model object in your account that you want to deploy with the inference component.

startupParameters InferenceComponentStartupParameters

Settings that take effect while the model container starts up.

baseInferenceComponentName string

The name of an existing inference component that is to contain the inference component that you're creating with your request.

Before you can create an adapter inference component, you must have an existing inference component that contains the foundation model that you want to adapt.

computeResourceRequirements InferenceComponentComputeResourceRequirements

The compute resources allocated to run the model, plus any adapter models, that you assign to the inference component.

container InferenceComponentContainerSpecification

Defines a container that provides the runtime environment for a model that you deploy with an inference component.

modelName string

The name of an existing SageMaker AI model object in your account that you want to deploy with the inference component.

startupParameters InferenceComponentStartupParameters

Settings that take effect while the model container starts up.

base_inference_component_name str

The name of an existing inference component that is to contain the inference component that you're creating with your request.

Before you can create an adapter inference component, you must have an existing inference component that contains the foundation model that you want to adapt.

compute_resource_requirements InferenceComponentComputeResourceRequirements

The compute resources allocated to run the model, plus any adapter models, that you assign to the inference component.

container InferenceComponentContainerSpecification

Defines a container that provides the runtime environment for a model that you deploy with an inference component.

model_name str

The name of an existing SageMaker AI model object in your account that you want to deploy with the inference component.

startup_parameters InferenceComponentStartupParameters

Settings that take effect while the model container starts up.

baseInferenceComponentName String

The name of an existing inference component that is to contain the inference component that you're creating with your request.

Before you can create an adapter inference component, you must have an existing inference component that contains the foundation model that you want to adapt.

computeResourceRequirements Property Map

The compute resources allocated to run the model, plus any adapter models, that you assign to the inference component.

container Property Map

Defines a container that provides the runtime environment for a model that you deploy with an inference component.

modelName String

The name of an existing SageMaker AI model object in your account that you want to deploy with the inference component.

startupParameters Property Map

Settings that take effect while the model container starts up.

InferenceComponentStartupParameters, InferenceComponentStartupParametersArgs

ContainerStartupHealthCheckTimeoutInSeconds int: The timeout value, in seconds, for your inference container to pass health check by Amazon S3 Hosting. For more information about health check, see How Your Container Should Respond to Health Check (Ping) Requests .
ModelDataDownloadTimeoutInSeconds int: The timeout value, in seconds, to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this inference component.

ContainerStartupHealthCheckTimeoutInSeconds int: The timeout value, in seconds, for your inference container to pass health check by Amazon S3 Hosting. For more information about health check, see How Your Container Should Respond to Health Check (Ping) Requests .
ModelDataDownloadTimeoutInSeconds int: The timeout value, in seconds, to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this inference component.

containerStartupHealthCheckTimeoutInSeconds Integer: The timeout value, in seconds, for your inference container to pass health check by Amazon S3 Hosting. For more information about health check, see How Your Container Should Respond to Health Check (Ping) Requests .
modelDataDownloadTimeoutInSeconds Integer: The timeout value, in seconds, to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this inference component.

containerStartupHealthCheckTimeoutInSeconds number: The timeout value, in seconds, for your inference container to pass health check by Amazon S3 Hosting. For more information about health check, see How Your Container Should Respond to Health Check (Ping) Requests .
modelDataDownloadTimeoutInSeconds number: The timeout value, in seconds, to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this inference component.

container_startup_health_check_timeout_in_seconds int: The timeout value, in seconds, for your inference container to pass health check by Amazon S3 Hosting. For more information about health check, see How Your Container Should Respond to Health Check (Ping) Requests .
model_data_download_timeout_in_seconds int: The timeout value, in seconds, to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this inference component.

containerStartupHealthCheckTimeoutInSeconds Number: The timeout value, in seconds, for your inference container to pass health check by Amazon S3 Hosting. For more information about health check, see How Your Container Should Respond to Health Check (Ping) Requests .
modelDataDownloadTimeoutInSeconds Number: The timeout value, in seconds, to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this inference component.

InferenceComponentStatus, InferenceComponentStatusArgs

InService: InService
Creating: Creating
Updating: Updating
Failed: Failed
Deleting: Deleting

InferenceComponentStatusInService: InService
InferenceComponentStatusCreating: Creating
InferenceComponentStatusUpdating: Updating
InferenceComponentStatusFailed: Failed
InferenceComponentStatusDeleting: Deleting

InService: InService
Creating: Creating
Updating: Updating
Failed: Failed
Deleting: Deleting

InService: InService
Creating: Creating
Updating: Updating
Failed: Failed
Deleting: Deleting

IN_SERVICE: InService
CREATING: Creating
UPDATING: Updating
FAILED: Failed
DELETING: Deleting

"InService": InService
"Creating": Creating
"Updating": Updating
"Failed": Failed
"Deleting": Deleting

Tag, TagArgs

Key string: The key name of the tag
Value string: The value of the tag

Key string: The key name of the tag
Value string: The value of the tag

key String: The key name of the tag
value String: The value of the tag

key string: The key name of the tag
value string: The value of the tag

key str: The key name of the tag
value str: The value of the tag

key String: The key name of the tag
value String: The value of the tag

Package Details

Repository: AWS Native pulumi/pulumi-aws-native
License: Apache-2.0

We recommend new projects start with resources from the AWS provider.

AWS Cloud Control v1.30.0 published on Monday, Jun 16, 2025 by Pulumi

pulumi/pulumi-aws-native

aws-native.sagemaker.InferenceComponent

On this page

On this page

Create InferenceComponent Resource

Constructor syntax

Parameters

InferenceComponent Resource Properties

Inputs

Outputs

Supporting Types

InferenceComponentAlarm, InferenceComponentAlarmArgs

InferenceComponentAutoRollbackConfiguration, InferenceComponentAutoRollbackConfigurationArgs

InferenceComponentCapacitySize, InferenceComponentCapacitySizeArgs

InferenceComponentCapacitySizeType, InferenceComponentCapacitySizeTypeArgs

InferenceComponentComputeResourceRequirements, InferenceComponentComputeResourceRequirementsArgs

InferenceComponentContainerSpecification, InferenceComponentContainerSpecificationArgs

InferenceComponentDeployedImage, InferenceComponentDeployedImageArgs

InferenceComponentDeploymentConfig, InferenceComponentDeploymentConfigArgs

InferenceComponentRollingUpdatePolicy, InferenceComponentRollingUpdatePolicyArgs

InferenceComponentRuntimeConfig, InferenceComponentRuntimeConfigArgs

InferenceComponentSpecification, InferenceComponentSpecificationArgs

InferenceComponentStartupParameters, InferenceComponentStartupParametersArgs

InferenceComponentStatus, InferenceComponentStatusArgs

Tag, TagArgs

Package Details

On this page

On this page