GPT-4o Transcribe

GPT-4o Transcribe is a speech-to-text model built on the GPT-4o architecture, delivering lower word error rates and stronger language recognition than the original Whisper models for production transcription workloads. Your use subject to OpenAI's Terms & Privacy Policies.

Use with AI Gateway View docs

import { experimental_transcribe as transcribe } from 'ai';
import { gateway } from '@ai-sdk/gateway';
import { readFile } from 'node:fs/promises';

const result = await transcribe({
  model: gateway.transcriptionModel('openai/gpt-4o-transcribe'),
  audio: await readFile('audio.mp3'),
});

Read docs

Overview About Providers Similar FAQ

Playground

Try out GPT-4o Transcribe by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

GPT-4o Transcribe

Speech to text

Record a short clip from your microphone and the model transcribes it to text.

Idle

Record a clip to see the transcript here.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Input	Capabilities	ZDR	No Training	Release Date

OpenAI

Legal:Terms•Privacy

03/13/2024

More models by OpenAI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Capabilities	Providers	ZDR	No Training	Release Date

openai/gpt-5.6-luna

1.1M

1.6s

124tps

$1/M$0.20/M

+1 more

$6/M$1.20/M

+1 more

Read:

$0.1/M$0.02/M+1 more

Write:

$1.25/M$0.25/M+1 more

$10/K

+ input costs

07/09/2026

openai/gpt-5.6-sol

1.1M

3.1s

93tps

$5/M+1 more

$30/M+1 more

Read:

$0.5/M+1 more

Write:

$6.25/M+1 more

$10/K

+ input costs

07/09/2026

openai/gpt-5.4-mini

400K

0.8s

232tps

$0.75/M

$4.50/M

Read:$0.07/M

Write:—

$10/K

+ input costs

03/17/2026

openai/gpt-5-nano

400K

3.6s

198tps

$0.05/M

$0.40/M

Read:$0.01/M

Write:—

$14/K

+ input costs

08/07/2025

openai/gpt-5-mini

400K

3.3s

169tps

$0.25/M

$2/M

Read:$0.03/M

Write:—

$14/K

+ input costs

08/07/2025

openai/gpt-oss-120b

131K

0.1s

482tps

$0.35/M

$0.75/M

Read:$0.25/M

Write:—

—

08/05/2025

About GPT-4o Transcribe

GPT-4o Transcribe launched on March 13, 2024 as part of OpenAI's next-generation audio models for the API. The model builds on the GPT-4o architecture and was extensively pretrained on specialized audio-centric datasets.

GPT-4o Transcribe improves word error rate over the Whisper model family across established benchmarks. On the FLEURS multilingual benchmark, GPT-4o Transcribe outperforms Whisper v2 and Whisper v3 across language evaluations. OpenAI attributes the advances to targeted reinforcement learning and midtraining with diverse, high-quality audio.

The practical effect is fewer misrecognitions in the conditions that break transcription products: heavy accents, noisy rooms, and varying speech speeds. Through AI Gateway, GPT-4o Transcribe runs behind the same key, observability, and spend controls as the rest of your model traffic.

What To Consider When Choosing a Provider

Configuration: Audio support on AI Gateway is in beta. GPT-4o Transcribe runs through the AI SDK's transcribe function, which accepts a file buffer, base64 string, or URL. For high-volume pipelines where cost per request dominates, benchmark gpt-4o-mini-transcribe first; the mini variant often gets close on clean audio.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GPT-4o Transcribe

Best for

Production transcription quality: Word error rate directly affects the product and cleanup budgets
Challenging audio conditions: Heavy accents, background noise, and variable speech speeds
High-stakes transcripts: Support, sales, and compliance recordings where misrecognitions carry real cost
Multilingual workloads: Strong FLEURS benchmark results across language evaluations

Consider alternatives when

Cost-driven pipelines: gpt-4o-mini-transcribe handles clean audio well at a lower rate
Speech translation needs: whisper-1 covers translation to English and language identification
Live voice agents: The gpt-realtime family handles speech-to-speech conversation directly

Conclusion

GPT-4o Transcribe is the model to reach for when transcript quality is the product. Route requests through AI Gateway with the AI SDK's transcribe function, and keep gpt-4o-mini-transcribe in reserve for the high-volume, cleaner-audio side of the pipeline.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

GPT-4o Transcribe

Playground

Speech to text

Providers

More models by OpenAI

About GPT-4o Transcribe

What To Consider When Choosing a Provider

When to Use GPT-4o Transcribe

Best for

Consider alternatives when

Conclusion