Comparing Fluid compute to public cloud compute on a pricing page is misleading, because Amazon EC2, AWS Fargate, and Amazon EKS bill for the capacity you provision, not for the CPU your application actually uses. Fluid compute with Active CPU pricing bills only for CPU that's actively running your code, so a fair comparison measures what you pay per unit of CPU your app actually consumes.
This guide compares all four options on that basis, using a like-for-like 1 vCPU / 2 GB shape, and shows how the result changes as real-world CPU utilization drops.
In this guide, you'll learn:
- Why provisioned cloud pricing understates the cost of the CPU you actually use
- How to compare compute options on cost per delivered active vCPU-hour
- How Fluid's all-in rate compares to EC2, Fargate, and EKS at realistic utilization levels
- What CPU utilization levels are realistic for API and AI workloads
EC2, Fargate, and EKS charge for the capacity you provision, whether or not your code is using it. When an instance runs at 40% CPU utilization, you still pay for 100% of the instance, and the remaining 60% is idle capacity you've already paid for. The published hourly rate, therefore, describes the cost of provisioned CPU, not the cost of the CPU that does your application's work.
Fluid compute prices differently. With Active CPU pricing, Vercel bills for CPU only while your code is running, pauses CPU billing during I/O waits, and charges nothing for CPU between requests. You still pay for provisioned memory while a request is in flight, at a rate less than 10% of the active CPU rate.
Because of this difference, the two models compare fairly only on the cost of the CPU your application actually consumes, not on the sticker rate.
A delivered active vCPU-hour is one hour of CPU time that actually ran your application's code. It's the unit of compute you're really buying, regardless of how a provider bills for it.
For a provisioned option, the cost per delivered active vCPU-hour is the provisioned hourly rate divided by CPU utilization:
At 50% utilization, you pay twice the sticker rate per active vCPU-hour because half of the capacity you bought did no work. At 25% utilization, you pay 4x the sticker rate.
For Fluid, the cost per delivered active vCPU-hour remains constant because Active CPU pricing doesn't bill for idle CPU. Lower utilization doesn't raise Fluid's effective rate.
For a Standard machine size (1 vCPU / 2 GB), Fluid compute costs $0.149 per delivered active vCPU-hour, all-in. That figure is $0.128 for one hour of active CPU plus $0.021 for 2 GB of provisioned memory ($0.0106 per GB-hour times 2 GB). Because Active CPU pricing charges only for active CPU, this $0.149 rate is what you pay per active vCPU-hour at any utilization level.
The table below shows the cost per delivered active vCPU-hour for each option at three utilization levels, using the 1 vCPU / 2 GB shape. Lower is cheaper. Fluid stays flat at $0.149 because idle CPU isn't billed, while the provisioned options get more expensive per active vCPU-hour as utilization falls.
| Cost per delivered active vCPU-hour | 40% utilization | 29% utilization | 8% utilization |
|---|---|---|---|
| Fluid (Active CPU pricing) | $0.149 | $0.149 | $0.149 |
| Amazon EC2 (c8i) | $0.117 | $0.162 | $0.586 |
| AWS Fargate (Linux/x86, 1:2) | $0.123 | $0.170 | $0.617 |
| Amazon EKS (Auto Mode, c8i) | $0.131 | $0.181 | $0.656 |
At 40% utilization, the provisioned options are cheaper per active vCPU-hour than Fluid: EC2 by 21.5%, Fargate by 17.3%, and EKS by 12.1%. This is the high-utilization end of the range, and few production fleets sustain it.
At 29% utilization, Fluid is the cheaper option: EC2 costs 8.3% more, Fargate 14.1% more, and EKS 21.3% more per active vCPU-hour.
At 8% utilization, the gap widens sharply. EC2 costs 293% more than Fluid, Fargate 314% more, and EKS 340% more, because most of the provisioned capacity sits idle while still being billed.
Fluid doesn't win every row. At high sustained utilization, provisioned compute is cheaper per active vCPU-hour. What the table shows is that across the utilization levels teams actually run at, Fluid lands in the same range or lower, because it removes idle CPU from the bill.
Most production fleets run well below the utilization targets they're configured for.
AWS's target tracking documentation uses 50% CPU as a normal scaling target, and notes that actual capacity often sits below the target because autoscaling rounds capacity up and scales in cautiously. Google documents the same gap between target and actual utilization in its autoscaler. The 40% utilization column in the comparison table is already a generous operating point.
For Node.js API services, the realistic level is lower, because CPU isn't the signal users feel. Users experience event loop pressure and tail latency, which can diverge from CPU usage enough for CPU-based scaling to react to the wrong signal. AI workloads push utilization lower still, because requests spend variable time waiting on inference rather than burning local CPU. In practice, many API and AI services settle well below a 50% target, closer to the 29% shown in the middle column of the table above.
Production Kubernetes fleets often run lower still. CAST AI's 2026 analysis of measured clusters put average CPU utilization at 8%, which is the low end of the table above, where provisioned compute costs three to four times Fluid's rate per active vCPU-hour.
The per-hour comparison leaves out operational overhead, which favors Fluid further. Fluid compute is fully managed, so you don't tune autoscaling policies, manage load balancing, or debug the failure modes that come from running too hot (downtime) or too cold (wasted spend). Operating EC2, Fargate, or EKS at a chosen utilization target is itself ongoing engineering work, and the gap between the target and what you achieve is the idle cost captured in the table above.
- Read how Active CPU pricing works and why it reduces costs for long-running I/O-bound and agentic workloads.
- See the Fluid compute pricing documentation for current active CPU, provisioned memory, and invocation rates.
- Learn how Fluid compute uses concurrency and dynamic scaling to reduce the capacity you provision.