GPT-3.5 Turbo
GPT-3.5 Turbo first brought ChatGPT-class conversational AI to the API at scale, delivering the same underlying capability at a price point that opened the door to production applications.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-3.5-turbo', prompt: 'Why is the sky blue?'})Playground
Try out GPT-3.5 Turbo by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by OpenAI
| Model |
|---|
About GPT-3.5 Turbo
GPT-3.5 Turbo launched on May 28, 2023 alongside the Whisper API as OpenAI opened its ChatGPT capability to developers through a dedicated API endpoint. At introduction it cost substantially less than the existing GPT-3.5 models, making it the first model to put conversational AI within budget for production-scale products.
The model uses a conversational message format where developers supply a list of messages with roles (system, user, assistant) rather than a single prompt string. This chat-native interface made it straightforward to build multi-turn experiences, chatbots, and instruction-following assistants without extra prompt engineering to simulate turns.
Over subsequent months OpenAI expanded the model: a 16K context variant arrived to handle roughly 20 pages of text in one request, and the standard model received a price reduction together with improved function-calling capabilities and better steerability. It remains widely deployed for workloads where conversational throughput and cost efficiency outweigh the need for frontier reasoning depth.
What To Consider When Choosing a Provider
- Configuration: If you run high message volumes and per-token cost is your dominant concern, GPT-3.5 Turbo remains one of the most economical options for straightforward conversational workloads.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use GPT-3.5 Turbo
Best For
- Customer support bots: FAQ systems where fast, on-topic replies matter more than complex reasoning
- Summarization pipelines: Processing large numbers of documents at low cost
- Draft generation: Emails, support tickets, or templated content at volume
- Multi-turn chat: Consumer or internal tools where latency and price sensitivity are high
- Lightweight classification: Intent detection tasks embedded in larger automation pipelines
Consider Alternatives When
- Advanced reasoning needed: Multi-step logical reasoning, math, or code generation where GPT-4-class accuracy is necessary
- Very long prompts: Your prompt regularly exceeds the context window and you need the full 1M-token range of GPT-4.1
- Multimodal input: You need native vision or audio input processing
- Strict instruction adherence: Complex, structured tasks where errors are costly
Conclusion
GPT-3.5 Turbo combined ChatGPT-class quality with a pricing tier that made scaling practical. For chat, summarization, and instruction-following workloads where cost per call is the primary constraint, it remains a solid option through AI Gateway.
Frequently Asked Questions
What API format does GPT-3.5 Turbo use?
It uses the Chat Completions API format. You send an array of messages with roles (
system,user,assistant) rather than a raw completion prompt string.How does GPT-3.5 Turbo differ from GPT-3.5 Turbo Instruct?
GPT-3.5 Turbo is built for the Chat Completions endpoint and conversational multi-turn use. GPT-3.5 Turbo Instruct targets the legacy Completions endpoint and is optimized for single-turn instruction tasks using a prompt-response format.
What context window does GPT-3.5 Turbo support?
The current model supports a context window of 16.4K tokens, suitable for multi-turn conversations and moderate-length document tasks.
Is GPT-3.5 Turbo suitable for function calling?
Yes. OpenAI added function-calling support, enabling developers to define external functions the model can invoke, making it viable for agentic and tool-use workflows within its capability tier.
How does AI Gateway handle authentication for GPT-3.5 Turbo?
AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.
Can I use GPT-3.5 Turbo for batch summarization pipelines?
Yes. Its combination of low per-token cost, fast response times, and context window of 16.4K tokens makes it well-suited for pipelines that process many documents in parallel.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.