Skip to content

GPT-3.5 Turbo

GPT-3.5 Turbo first brought ChatGPT-class conversational AI to the API at scale, delivering the same underlying capability at a price point that opened the door to production applications.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-3.5-turbo',
prompt: 'Why is the sky blue?'
})

Playground

Try out GPT-3.5 Turbo by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
OpenAI
Legal:Terms
Privacy
16K
0.6s
74tps
$0.50/M$1.50/M
05/28/2023
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by OpenAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
2.3s
62tps
$5.00/M
$30.00/M
Read:
$0.5/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
04/24/2026
400K
1.3s
185tps
$0.75/M$4.50/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
400K
0.5s
67tps
$0.20/M$1.25/M
Read:$0.02/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
1.1M
0.8s
69tps
$2.50/M
$15.00/M
Read:
$0.25/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/05/2026
128K
0.8s
111tps
$1.25/M$10.00/M
Read:$0.13/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
11/12/2025
131K
0.2s
2848tps
$0.35/M$0.75/M
Read:$0.25/M
Write:
baseten logo
bedrock logo
cerebras logo
+5
08/05/2025

About GPT-3.5 Turbo

GPT-3.5 Turbo launched on May 28, 2023 alongside the Whisper API as OpenAI opened its ChatGPT capability to developers through a dedicated API endpoint. At introduction it cost substantially less than the existing GPT-3.5 models, making it the first model to put conversational AI within budget for production-scale products.

The model uses a conversational message format where developers supply a list of messages with roles (system, user, assistant) rather than a single prompt string. This chat-native interface made it straightforward to build multi-turn experiences, chatbots, and instruction-following assistants without extra prompt engineering to simulate turns.

Over subsequent months OpenAI expanded the model: a 16K context variant arrived to handle roughly 20 pages of text in one request, and the standard model received a price reduction together with improved function-calling capabilities and better steerability. It remains widely deployed for workloads where conversational throughput and cost efficiency outweigh the need for frontier reasoning depth.

What To Consider When Choosing a Provider

  • Configuration: If you run high message volumes and per-token cost is your dominant concern, GPT-3.5 Turbo remains one of the most economical options for straightforward conversational workloads.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GPT-3.5 Turbo

Best For

  • Customer support bots: FAQ systems where fast, on-topic replies matter more than complex reasoning
  • Summarization pipelines: Processing large numbers of documents at low cost
  • Draft generation: Emails, support tickets, or templated content at volume
  • Multi-turn chat: Consumer or internal tools where latency and price sensitivity are high
  • Lightweight classification: Intent detection tasks embedded in larger automation pipelines

Consider Alternatives When

  • Advanced reasoning needed: Multi-step logical reasoning, math, or code generation where GPT-4-class accuracy is necessary
  • Very long prompts: Your prompt regularly exceeds the context window and you need the full 1M-token range of GPT-4.1
  • Multimodal input: You need native vision or audio input processing
  • Strict instruction adherence: Complex, structured tasks where errors are costly

Conclusion

GPT-3.5 Turbo combined ChatGPT-class quality with a pricing tier that made scaling practical. For chat, summarization, and instruction-following workloads where cost per call is the primary constraint, it remains a solid option through AI Gateway.

Frequently Asked Questions

  • What API format does GPT-3.5 Turbo use?

    It uses the Chat Completions API format. You send an array of messages with roles (system, user, assistant) rather than a raw completion prompt string.

  • How does GPT-3.5 Turbo differ from GPT-3.5 Turbo Instruct?

    GPT-3.5 Turbo is built for the Chat Completions endpoint and conversational multi-turn use. GPT-3.5 Turbo Instruct targets the legacy Completions endpoint and is optimized for single-turn instruction tasks using a prompt-response format.

  • What context window does GPT-3.5 Turbo support?

    The current model supports a context window of 16.4K tokens, suitable for multi-turn conversations and moderate-length document tasks.

  • Is GPT-3.5 Turbo suitable for function calling?

    Yes. OpenAI added function-calling support, enabling developers to define external functions the model can invoke, making it viable for agentic and tool-use workflows within its capability tier.

  • How does AI Gateway handle authentication for GPT-3.5 Turbo?

    AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

  • Can I use GPT-3.5 Turbo for batch summarization pipelines?

    Yes. Its combination of low per-token cost, fast response times, and context window of 16.4K tokens makes it well-suited for pipelines that process many documents in parallel.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.