GPT-3.5 Turbo
GPT-3.5 Turbo first brought ChatGPT-class conversational AI to the API at scale, delivering the same underlying capability at a price point that opened the door to production applications.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-3.5-turbo', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
If you run high message volumes and per-token cost is your dominant concern, GPT-3.5 Turbo remains one of the most economical options for straightforward conversational workloads.
When to Use GPT-3.5 Turbo
Best For
Customer support bots:
FAQ systems where fast, on-topic replies matter more than complex reasoning
Summarization pipelines:
Processing large numbers of documents at low cost
Draft generation:
Emails, support tickets, or templated content at volume
Multi-turn chat:
Consumer or internal tools where latency and price sensitivity are high
Lightweight classification:
Intent detection tasks embedded in larger automation pipelines
Consider Alternatives When
Advanced reasoning needed:
Multi-step logical reasoning, math, or code generation where GPT-4-class accuracy is necessary
Very long prompts:
Your prompt regularly exceeds the context window and you need the full 1M-token range of GPT-4.1
Multimodal input:
You need native vision or audio input processing
Strict instruction adherence:
Complex, structured tasks where errors are costly
Conclusion
GPT-3.5 Turbo combined ChatGPT-class quality with a pricing tier that made scaling practical. For chat, summarization, and instruction-following workloads where cost per call is the primary constraint, it remains a solid option through AI Gateway.
FAQ
It uses the Chat Completions API format. You send an array of messages with roles (system, user, assistant) rather than a raw completion prompt string.
GPT-3.5 Turbo is built for the Chat Completions endpoint and conversational multi-turn use. GPT-3.5 Turbo Instruct targets the legacy Completions endpoint and is optimized for single-turn instruction tasks using a prompt-response format.
The current model supports a context window of 16.4K tokens, suitable for multi-turn conversations and moderate-length document tasks.
Yes. OpenAI added function-calling support, enabling developers to define external functions the model can invoke, making it viable for agentic and tool-use workflows within its capability tier.
AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.
Yes. Its combination of low per-token cost, fast response times, and context window of 16.4K tokens makes it well-suited for pipelines that process many documents in parallel.
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.