Fluid compute

Learn how to enable fluid compute, an execution model for Vercel Functions that provides a more flexible and efficient way to run your functions.

Fluid compute offers a blend of serverless flexibility and server-like capabilities. Unlike traditional serverless architectures, which can face issues such as cold starts and limited functionalities, fluid compute provides a hybrid solution. It overcomes the limitations of both serverless and server-based approaches, delivering the advantages of both worlds, including:

Zero configuration out of the box: Fluid compute comes with preset defaults that automatically optimize your functions for both performance and cost efficiency.
Optimized concurrency: Optimize resource usage by handling multiple invocations within a single function instance. Can be used with the Node.js and Python runtimes.
Dynamic scaling: Fluid compute automatically optimizes existing resources before scaling up to meet traffic demands. This ensures low latency during high-traffic events and cost efficiency during quieter periods.
Background processing: After fulfilling user requests, you can continue executing background tasks using waitUntil. This allows for a responsive user experience while performing time-consuming operations like logging and analytics in the background.
Automatic cold start optimizations: Reduces the effects of cold starts through automatic bytecode optimization, and function pre-warming on production deployments.
Cross-region and availability zone failover: Ensure high availability by first failing over to another availability zone (AZ) within the same region if one goes down. If all zones in that region are unavailable, Vercel automatically redirects traffic to the next closest region. Zone-level failover also applies to non-fluid deployments.

See What is Compute? to learn more about fluid compute and how it compares to traditional serverless models.

How to enable fluid compute

To enable fluid compute:

Navigate to your project in the Vercel dashboard.
Click on the Settings tab and select the Functions section.
Scroll to the Fluid Compute section and enable the toggle for Fluid Compute.
Redeploy your project to apply the changes.

Available runtime support

Fluid compute is available for the following runtimes:

Optimized concurrency

If you enabled in-function concurrency before February 4, 2025, you now receive it as part of fluid compute. By activating fluid compute, you automatically gain in-function concurrency and an increased maximum duration across all plans. For more details, see the default settings by plan.

Note, the in-function concurrency beta ends on February 20, 2025. You can enable fluid compute and access this feature in your dashboard.

Fluid compute allows multiple invocations to share a single function instance, this is especially valuable for AI applications, where tasks like fetching embeddings, querying vector databases, or calling external APIs can be I/O-bound. By allowing concurrent execution within the same instance, you can reduce cold starts, minimize latency, and lower compute costs.

How multiple requests are processed in the fluid compute model with optimized concurrency.

Vercel Functions prioritize existing idle resources before allocating new ones, reducing unnecessary compute usage. This in-function-concurrency is especially effective when multiple requests target the same function, leading to fewer total resources needed for the same workload.

Optimized concurrency in fluid compute is available when using Node.js or Python runtimes. See the efficient serverless Node.js with in-function concurrency blog post to learn more.

Bytecode caching

When using Node.js version 20+, Vercel Functions use bytecode caching to reduce cold start times. This stores the compiled bytecode of JavaScript files after their first execution, eliminating the need for recompilation during subsequent cold starts.

As a result, the first request isn't cached yet. However, subsequent requests benefit from the cached bytecode, enabling faster initialization. This optimization is especially beneficial for functions that are not invoked that often, as they will see faster cold starts and reduced latency for end users.

Bytecode caching is only applied to production environments, and is not available in development or preview deployments.

For frameworks that output ESM, all CommonJS dependencies (for example, react, node-fetch) will be opted into bytecode caching.

Isolation boundaries and global state

On traditional serverless compute, the isolation boundary refers to the separation of individual instances of a function to ensure they don't interfere with each other. This provides a secure execution environment for each function.

However, because each function uses a microVM for isolation, which can lead to slower start-up times, you can see an increase in resource usage due to idle periods when the microVM remains inactive.

Fluid compute uses a different approach to isolation. Instead of using a microVM for each function invocation, multiple invocations can share the same physical instance (a global state/process) concurrently. This allows functions to share resources and execute in the same environment, which can improve performance and reduce costs.

Default settings by plan

Fluid Compute includes default settings that vary by plan:

Settings	Hobby	Pro	Enterprise
CPU configuration	Managed	Standard / Performance	Standard / Performance
Default / Max duration	60s / 60s	90s / 800s	90s / 800s
Multi-region failover
Multi-region functions		Up to 3	All

Order of settings precedence

When you enable fluid compute, the settings you configure in your function code, dashboard, or vercel.json file will override the default fluid compute settings.

The following order of precedence determines which settings take effect. Settings you define later in the sequence will always override those defined earlier:

Precedence	Stage	Explanation	Can Override
1	Function code	Settings in your function code always take top priority. These include max duration defined directly in your code.	`maxDuration`
2	`vercel.json`	Any settings in your `vercel.json` file, like max duration, region, and CPU, will override dashboard and Fluid defaults.	`maxDuration`, `region`, `memory`
3	Dashboard	Changes made in the dashboard, such as max duration, region, or CPU, override Fluid defaults.	`maxDuration`, `region`, `memory`
4	Fluid defaults	These are the default settings applied automatically when you have Fluid enabled, and do not configure any other settings.

Pricing and usage

Functions are billed based on GB-hours calculated by multiplying the memory allocated to the function by the duration of the function execution.

On a traditional serverless model (and without fluid enabled), running 10 concurrent invocations in Standard memory with each invocation using 1 second of compute time and 9 seconds of idle time waiting for an external response (e.g. OpenAI model response), results in billing for:

10 instances x 1.7GB x 10 seconds = $0.0085.

In this scenario, 90 seconds were idle waiting for external responses without utilizing compute resources effectively.

With fluid compute enabled, 10 invocations can share a single instance. When one invocation waits for an external response, the remaining invocations execute simultaneously on the same instance. This reduces the total billing to:

1 instance x 1.7GB x 10 seconds = $0.00085.

This results in a savings of $0.00765 or 90% per 10 invocations. Note that the above scenario is used for illustrative purposes, and your actual costs depend on usage and function configurations.

If you have enabled fluid compute, and then configure your function to less than 1GB of memory through vercel.json, concurrency optimizations will be disabled.

Last updated on March 4, 2025

Account Management

Was this helpful?