AWS Lambda Response Streaming with Pulumi

Posted on

Since its introduction in 2014, AWS Lambda has rapidly expanded its capabilities from simple “functions in the cloud” at launch to a comprehensive serverless platform with support for containerized functions and public per-function URL endpoints.

As serverless applications have increased in sophistication, developers have used functions-as-a-service as a first-class tool in their microservices strategy. As organizations increasingly look to break up their monolithic applications into services, adoption of AWS Lambda has not been a viable option for applications that return payloads larger than the 6 MiB Lambda service limit. In addition to the payload response limit, AWS Lambda has been limited to returning the entire response once it has been fully buffered in memory within the Lambda service. This historical limitation meant that end users would not see any response until the entire response was available. Both of these limitations made Lambda difficult or impossible to use for data-intensive or latency-sensitive use cases.

Introducing Response Streaming

Today, AWS announced Lambda Response Streaming: a new feature that enables response payloads beyond the 6MiB limit, supports binary content in responses and reduces response times for latency-sensitive applications. Response Streaming reduces the time to first byte (TTFB) by enabling your functions to send partial responses to the client as soon as they are ready instead of waiting for a fully-generated and buffered response. This reduces multi-second TTFB to milliseconds, and, you can stream payloads that are gigabytes in size with a soft 20MiB limit for streamed payloads. The maximum throughput for a streamed response is 2MiB/s.

Enabling Response Streaming requires a new Lambda function handler signature that provides a stream object that the function can write to. When your function writes data to this stream, it is sent immediately to the client. To help clients understand the contents of the stream, you can optionally set the Content-Type header for a response.

This feature supports NodeJS v14.x, v16.x and v18.x at launch and comes with an updated billing model that charges for bytes processed and streamed from your Lambda Function.

Authoring a Lambda Function for Response Streaming

For an example of how response streaming works in practice, we’ve built a Lambda function that will stream the Gettysburg Address one sentence at a time with a one second pause between sentences. For brevity, each sentence is abbreviated below (but the code in GitHub, which demonstrates the same function with and without streaming, contains the full text of the speech):

const handleInternal = async (_event, responseStream, _context) => {

 const sentences = [
   "Fourscore and seven years ago...",
   "Now we are engaged in a great civil war...",
   "We have come to dedicate...",
   "It is altogether fitting...",
   "But, in a larger sense...",
   "The brave men...",
   "The world will little note...",
   "It is for us the living...",

 const timer = ms => new Promise( res => setTimeout(res, ms));

 for(let i = 0; i < sentences.length; i++) {
   await timer(1000);


exports.handler = awslambda.streamifyResponse(handleInternal);

The notable changes to the Lambda function code that support response streaming are:

  • The function must be wrapped in the awslambda.streamifyResponse() middleware. (The streamifyResponse() function is defined in the Node runtime that’s supplied by the Lambda service.)
  • Handler methods require an additional parameter to the usual (event, context) signature to become (event, responseStream, context).
  • The responseStream parameter implements Node’s Writable interface to give it a familiar API with other streams that Node developers may be familiar with.
  • Responses to the client must be written via responseStream.write(), the method that writes to the stream.
  • The function must call responseStream.end() to signal that no more data is expected.
  • Note that functions authored for streaming can still be seamlessly invoked with buffered responses.

Streaming Performance Considerations

For this simple text-based example, there is little need to worry about overall performance. However, if the response is large or the client is expected to perform a lot of processing on the response data, AWS suggests using Node’s pipeline() in place of the built-in write() method in case data is produced faster than it can be consumed by the client. pipeline() effectively helps you to pipe between streams, forward errors and properly clean up – providing a callback when the pipeline is complete. You can learn more about this approach in Backpressuring in streams in the Node docs.

Packaging and deploying your Lambda Function for streaming with Pulumi

We can package our Lambda function for streaming with just a few dozen lines of code using Pulumi. First, we’ll need to import some libraries that will allow us to package our Lambda and create the necessary resources in AWS. For this exercise, we’ll be using both the AWS Native provider and the AWS Classic provider. The AWS Native provider works with the AWS Cloud Control API to enable same-day support for new AWS features like Lambda response streaming.

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
import * as awsNative from "@pulumi/aws-native";

Next, we’ll create an IAM role for our Lambda function to assume that has the standard required permissions for a Lambda function:

const role = new aws.iam.Role("role", {
 assumeRolePolicy: JSON.stringify({
   "Version": "2012-10-17",
   "Statement": [{
     "Effect": "Allow",
     "Principal": {
       "Service": "",
     "Action": "sts:AssumeRole",

new aws.iam.RolePolicyAttachment("role-policy-attachment", {
 policyArn: "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole",

Streamed responses are not currently supported with API Gateway nor with Application Load Balancers, so they must be invoked via Lambda Function URLs or invoked and streamed through the InvokeWithResponseStream API using the AWS SDK. Let’s create our function, add a function URL, and the necessary permissions to allow unauthenticated access to the function URL. Note that we set the invokeMode input of the function URL to RESPONSE_STREAM. This is what allows us to invoke the function with streaming enabled:

const streamingFunc = new aws.lambda.Function("streaming-func", {
 code: new pulumi.asset.FileArchive("../../lambda"),
 role: role.arn,
 handler: "index.handler",
 runtime: "nodejs14.x",
 timeout: 30,

new aws.lambda.Permission("streaming-permission", {
 action: "lambda:InvokeFunctionUrl",
 "function": streamingFunc.arn,
 principal: "*",
 functionUrlAuthType: "NONE",

const streamingUrl = new awsNative.lambda.Url("streaming-url", {
 authType: "NONE",
 targetFunctionArn: streamingFunc.arn,
 invokeMode: "RESPONSE_STREAM",

Finally, we’ll add a stack export so we can access the generated function URL from outside of our Pulumi program:

exports.streamingUrl = streamingUrl.functionUrl;

To deploy our resources, we run pulumi up -y:

$ pulumi up -y
Previewing update (dev)

View Live:

 	Type                         	Name                  	Plan   	Info
 +   pulumi:pulumi:Stack          	aws-lambda-streaming-dev  create
 +   ├─ aws:iam:Role              	role                  	create
 +   ├─ aws:iam:RolePolicyAttachment  role-policy-attachment	create
 +   ├─ aws:lambda:Function       	streaming-func        	create
 +   ├─ aws:lambda:Permission     	streaming-permission  	create
 +   └─ aws-native:lambda:Url     	streaming-url         	create

	streamingUrl: output<string>

Updating (dev)

View Live:

 	Type                         	Name                  	Status          	Info
 +   pulumi:pulumi:Stack          	aws-lambda-streaming-dev  creating (43s)...
 +   ├─ aws:iam:Role              	role                  	created (1s)
 +   ├─ aws:iam:RolePolicyAttachment  role-policy-attachment	created (0.33s)
 +   ├─ aws:lambda:Function       	streaming-func        	created (22s)
 +   ├─ aws-native:lambda:Url     	streaming-url         	created (2s)
 +   └─ aws:lambda:Permission     	streaming-permission  	created (0.58s)

	streamingUrl: ""

	+ 6 created

Duration: 46s

Reading a Response Stream

Now that we’ve deployed our streaming function, we can read a response stream from the deployed Lambda URL. Per AWS, we can use any client that supports HTTP/1.1 chunked transfer coding. In cURL, streamed responses can be read via the -N flag. We can also add output that displays our time to connect, time to first byte (TTFB), and the total time to return the response:

curl -N -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} \n" $(pulumi stack output streamingUrl)

When successful, your output will look like this:


This example visually shows the advantage of response streaming because the first message appears shortly (approximately 1 second) after invocation. If we disable streaming, the fully buffered response takes several seconds to arrive. You can see the buffered response example in action. This is precisely why TFFB is important: it’s the difference between an app or service that appears to hang while awaiting a response and one that feels much more responsive to an end user.

Wrapping up

AWS Lambda Response Streaming continues the Lambda team’s steady drumbeat of innovation that adds new capabilities and use cases to serverless functions. Grab the sample code and give the feature a try. Then, let us know what you think in the Pulumi Community Slack.