Vercel Functions support streaming data over time, allowing you to render parts of the UI as they're ready. Doing so lets users interact with your app before the full page loads by populating the most important components first. Common use-cases include:
- E-commerce: Render the most important product and account data early, letting customers shop sooner
- AI applications: Streaming responses from AIs powered by lets you display response text as it arrives rather than waiting for the full result
HTTP responses typically send the entire payload to the client all at once. This approach can sometimes result in a slow user experience if the data is large or computationally intense.
The Web Streams API enables you to stream chunks of the payload as they become available, improving your users' perception of how fast data is loading. It is supported in most major web browsers and popular runtimes, such as Node.js and Deno.
The Web Streams API helps you:
- Break large data into chunks: Chunks are portions of data sent over time
- Handle backpressure: Backpressure occurs when chunks are streamed from the server faster than they can be processed in the client, causing a backup of data
- Build more responsive apps: Rendering your UI progressively as data chunks arrive can improve your users' perception of your app's performance
Chunks in web streams are fundamental data units that can be of many different types depending on the content, such as
String for text or
Uint8Array for binary files. Standard Function responses contain full payloads of data, while chunks are pieces of the payload that get streamed to the client as they're available.
For example, imagine you want to create an AI chat app that uses a Large Language Model to generate replies. Due to their large data sets, replies from language models can generate slowly.
Standard Function responses require you send the full reply to the client when it's done, but streaming enables you to show each word of the reply as the model generates it, improving users' perception of your chat app's speed.
Chunk sizes can be out of your control, so it's important that your code can handle chunks of any size. Chunks sizes are influenced by the following factors:
- Data source: Sometimes the original data is already broken up. For example, OpenAI's language models produce responses in tokens, or chunks of words
- Stream implementation: The server could be configured to stream small chunks quickly or large chunks at a lower pace
- Network: Factors like a network's , or its geographical distance from the client, can cause chunk fragmentation and limit chunk size
- In local development, chunk sizes won't be impacted by network conditions, as no network transmission is happening
For an example Function that processes chunks, see Streaming Examples.
Once you understand how to deal with chunks of different sizes, you must understand how to deal with chunks arriving faster than you can process them in the client.
When the server streams data faster than the client can process it, excess data will queue up in the client's memory. This issue is called backpressure, and it can lead to memory overflow errors, or data loss when the client's memory reaches capacity.
For example, popular social media users can receive hundreds of notifications streamed to their web client per second. If their web client can't render the notifications fast enough, some may be lost, or the client may crash if its memory overflows.
You can handle backpressure with a technique called flow control. This technique manages data transfer rates between two nodes to avoid overwhelming a slow receiver.
For an example of how to handle backpressure, see Streaming Examples.
- To get started streaming on Vercel, see Streaming Quickstart
- For more detailed code, see Streaming Examples
Some frameworks, like Next.js and SvelteKit, have built-in functionality for streaming UI components. Doing so allows you to specify components that should be rendered with streaming data without needing a Function.
See the following docs to learn more about streaming UI components with your preferred framework:
Sometimes streaming a response causes your function to run for a long time before finishing execution. This can cause an error if you're streaming in Serverless Functions due to their maximum duration limits. In such cases, Vercel recommends streaming with Edge Functions, as there is no time limit for returning a response.
Remember that Edge Functions don't have access to all Node.js APIs, meaning some popular NPM packages aren't available. Edge Functions also have a memory limit of 128 MB. If your streamed data exceeds this size, it will cause an error.
There are many considerations to make when choosing a runtime. See Runtimes for a high-level overview.
Vercel supports streaming responses with Edge Functions under the following limitations:
- You must begin sending a response within the allowed maximum initial response time
- After the initial response begins, you can continuously stream the response with no time limit
- Your streamed response size cannot exceed Vercel's memory allocation limit of 128 MB
Exceeding these limits will cause your Edge Function to fail. See Edge Function Limitations to learn more.
Vercel supports streaming in Serverless Functions when using:
Serverless streaming with the
api directory in frameworks other than Next.js is not supported. Check your preferred framework's docs to learn if it supports streaming. Otherwise, Vercel recommends streaming with Edge Functions.
Serverless Functions cannot run longer than the maximum duration allowed on your account plan. See
maxDuration to learn more.
To stream responses for a longer time than your allowed maximum duration, consider using Edge Functions, as they have no upper time limit.