GPT-3.5 Turbo
GPT-3.5 Turbo first brought ChatGPT-class conversational AI to the API at scale, delivering the same underlying capability at a price point that opened the door to production applications.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-3.5-turbo', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What API format does GPT-3.5 Turbo use?
It uses the Chat Completions API format. You send an array of messages with roles (
system,user,assistant) rather than a raw completion prompt string.How does GPT-3.5 Turbo differ from GPT-3.5 Turbo Instruct?
GPT-3.5 Turbo is built for the Chat Completions endpoint and conversational multi-turn use. GPT-3.5 Turbo Instruct targets the legacy Completions endpoint and is optimized for single-turn instruction tasks using a prompt-response format.
What context window does GPT-3.5 Turbo support?
The current model supports a context window of 16.4K tokens, suitable for multi-turn conversations and moderate-length document tasks.
Is GPT-3.5 Turbo suitable for function calling?
Yes. OpenAI added function-calling support, enabling developers to define external functions the model can invoke, making it viable for agentic and tool-use workflows within its capability tier.
How does AI Gateway handle authentication for GPT-3.5 Turbo?
AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.
Can I use GPT-3.5 Turbo for batch summarization pipelines?
Yes. Its combination of low per-token cost, fast response times, and context window of 16.4K tokens makes it well-suited for pipelines that process many documents in parallel.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.