Whisper

Whisper is OpenAI's general-purpose speech recognition model, trained on 680,000 hours of multilingual audio and able to transcribe speech, translate it to English, and identify languages as a single multitask model. Your use subject to OpenAI's Terms & Privacy Policies.

translationTranscription

Use with AI Gateway View docs

import { experimental_transcribe as transcribe } from 'ai';
import { gateway } from '@ai-sdk/gateway';
import { readFile } from 'node:fs/promises';

const result = await transcribe({
  model: gateway.transcriptionModel('openai/whisper-1'),
  audio: await readFile('audio.mp3'),
});

Read docs

Overview About Providers Similar FAQ

Playground

Try out Whisper by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Whisper

Speech to text

Record a short clip from your microphone and the model transcribes it to text.

Idle

Record a clip to see the transcript here.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Input	Capabilities	ZDR	No Training	Release Date

OpenAI

Legal:Terms•Privacy

$0.36/hr

09/21/2022

More models by OpenAI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Capabilities	Providers	ZDR	No Training	Release Date

openai/gpt-5.6-luna

1.1M

1.8s

157tps

$1/M$0.20/M

+1 more

$6/M$1.20/M

+1 more

Read:

$0.1/M$0.02/M+1 more

Write:

$1.25/M$0.25/M+1 more

$10/K

+ input costs

07/09/2026

openai/gpt-5.6-sol

1.1M

3.5s

88tps

$5/M+1 more

$30/M+1 more

Read:

$0.5/M+1 more

Write:

$6.25/M+1 more

$10/K

+ input costs

07/09/2026

openai/gpt-5.4-mini

400K

0.7s

162tps

$0.75/M

$4.50/M

Read:$0.07/M

Write:—

$10/K

+ input costs

03/17/2026

openai/gpt-5-nano

400K

3.9s

203tps

$0.05/M

$0.40/M

Read:$0.01/M

Write:—

$14/K

+ input costs

08/07/2025

openai/gpt-5-mini

400K

3.2s

133tps

$0.25/M

$2/M

Read:$0.03/M

Write:—

$14/K

+ input costs

08/07/2025

openai/gpt-oss-120b

131K

0.2s

482tps

$0.35/M

$0.75/M

Read:$0.25/M

Write:—

—

08/05/2025

About Whisper

OpenAI released Whisper as an open-source speech recognition system in September 2022, trained on 680,000 hours of multilingual and multitask supervised data collected from the web. That scale and diversity made Whisper robust to accents, background noise, and technical language. The hosted API model, Whisper, arrived on September 21, 2022 alongside the GPT-3.5 Turbo API, serving the large-v2 weights through a managed endpoint.

Whisper is a multitask model. Beyond multilingual transcription, Whisper performs speech translation into English and language identification, so one model covers the common speech workflows without separate systems.

Because the underlying model is open source, Whisper became the reference point for speech-to-text across the industry. Teams that run open-source Whisper locally can use Whisper through AI Gateway for the hosted path, keeping output behavior familiar while gaining unified authentication, observability, and spend controls.

What To Consider When Choosing a Provider

Configuration: Newer transcription models exist in the catalog: gpt-4o-transcribe and gpt-4o-mini-transcribe improve word error rate over Whisper-generation models. Whisper remains the pick when you want the multitask capabilities, duration-based pricing, or consistency with self-hosted open-source Whisper deployments.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Whisper

Best for

Multilingual transcription: One model handles many languages without per-language configuration
Speech translation: Non-English audio translated directly into English text
Language identification: Detecting the spoken language as a pipeline preprocessing step
Duration-based billing: Costs scale with audio length, which simplifies budget modeling
Open-source consistency: Hosted behavior that matches self-run Whisper deployments

Consider alternatives when

Accuracy-critical transcripts: gpt-4o-transcribe and gpt-4o-mini-transcribe deliver lower word error rates
Live voice experiences: The gpt-realtime family handles speech-to-speech conversation directly
High-volume token pricing: The newer transcribe models bill by tokens if that fits your cost model better

Conclusion

Whisper remains the versatile workhorse of speech recognition: multilingual, multitask, and consistent with the open-source model the industry standardized on. Use Whisper through AI Gateway for translation and language identification, and reach for gpt-4o-transcribe when accuracy alone decides.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Whisper

Playground

Speech to text

Providers

More models by OpenAI

About Whisper

What To Consider When Choosing a Provider

When to Use Whisper

Best for

Consider alternatives when

Conclusion