What does "omni" mean in GPT-4o's name?

It reflects the model's end-to-end native training across text, audio, image, and video modalities, rather than being a combination of separate specialist models connected by a pipeline.

How much faster is GPT-4o's audio response compared to earlier voice pipelines?

GPT-4o averages 320 milliseconds for audio responses; the prior GPT-4 Turbo-based voice approach averaged 5.4 seconds, making GPT-4o approximately 16x faster for voice.

How does GPT-4o's API pricing compare to GPT-4 Turbo?

GPT-4o launched at lower API cost than GPT-4 Turbo while matching its performance on English text and code, and improving on non-English languages.

What input and output modalities does GPT-4o support?

Inputs: text, audio, image, video. Outputs: text, audio, and image. This breadth makes it flexible for diverse multimodal application architectures.

Is the "gpt-4o" model alias the same as a specific dated snapshot?

No. The alias `gpt-4o` points to the latest stable version, which may be updated over time. Dated snapshots like gpt-4o-2024-05-13 or gpt-4o-2024-11-20 pin to specific releases.

Does routing GPT-4o through AI Gateway add latency?

AI Gateway is designed as a lightweight routing layer. For most applications, the observability, caching, and authentication benefits outweigh any marginal overhead.

What are typical latency characteristics?

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

GPT-4o

GPT-4o is OpenAI's first natively multimodal "omni" model, unifying text, audio, image, and video processing within a single end-to-end trained architecture and delivering audio response times averaging 320 milliseconds, comparable to human conversational latency.

File InputTool UseVision (Image)Implicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'openai/gpt-4o',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

What does "omni" mean in GPT-4o's name?
It reflects the model's end-to-end native training across text, audio, image, and video modalities, rather than being a combination of separate specialist models connected by a pipeline.
How much faster is GPT-4o's audio response compared to earlier voice pipelines?
GPT-4o averages 320 milliseconds for audio responses; the prior GPT-4 Turbo-based voice approach averaged 5.4 seconds, making GPT-4o approximately 16x faster for voice.
How does GPT-4o's API pricing compare to GPT-4 Turbo?
GPT-4o launched at lower API cost than GPT-4 Turbo while matching its performance on English text and code, and improving on non-English languages.
What input and output modalities does GPT-4o support?
Inputs: text, audio, image, video. Outputs: text, audio, and image. This breadth makes it flexible for diverse multimodal application architectures.
Is the "gpt-4o" model alias the same as a specific dated snapshot?
No. The alias gpt-4o points to the latest stable version, which may be updated over time. Dated snapshots like gpt-4o-2024-05-13 or gpt-4o-2024-11-20 pin to specific releases.
Does routing GPT-4o through AI Gateway add latency?
AI Gateway is designed as a lightweight routing layer. For most applications, the observability, caching, and authentication benefits outweigh any marginal overhead.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GPT-4o

Frequently Asked Questions