Skip to content

GPT-3.5 Turbo

openai/gpt-3.5-turbo

GPT-3.5 Turbo first brought ChatGPT-class conversational AI to the API at scale, delivering the same underlying capability at a price point that opened the door to production applications.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-3.5-turbo',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

If you run high message volumes and per-token cost is your dominant concern, GPT-3.5 Turbo remains one of the most economical options for straightforward conversational workloads.

When to Use GPT-3.5 Turbo

Best For

  • Customer support bots:

    FAQ systems where fast, on-topic replies matter more than complex reasoning

  • Summarization pipelines:

    Processing large numbers of documents at low cost

  • Draft generation:

    Emails, support tickets, or templated content at volume

  • Multi-turn chat:

    Consumer or internal tools where latency and price sensitivity are high

  • Lightweight classification:

    Intent detection tasks embedded in larger automation pipelines

Consider Alternatives When

  • Advanced reasoning needed:

    Multi-step logical reasoning, math, or code generation where GPT-4-class accuracy is necessary

  • Very long prompts:

    Your prompt regularly exceeds the context window and you need the full 1M-token range of GPT-4.1

  • Multimodal input:

    You need native vision or audio input processing

  • Strict instruction adherence:

    Complex, structured tasks where errors are costly

Conclusion

GPT-3.5 Turbo combined ChatGPT-class quality with a pricing tier that made scaling practical. For chat, summarization, and instruction-following workloads where cost per call is the primary constraint, it remains a solid option through AI Gateway.

FAQ

It uses the Chat Completions API format. You send an array of messages with roles (system, user, assistant) rather than a raw completion prompt string.

GPT-3.5 Turbo is built for the Chat Completions endpoint and conversational multi-turn use. GPT-3.5 Turbo Instruct targets the legacy Completions endpoint and is optimized for single-turn instruction tasks using a prompt-response format.

The current model supports a context window of 16.4K tokens, suitable for multi-turn conversations and moderate-length document tasks.

Yes. OpenAI added function-calling support, enabling developers to define external functions the model can invoke, making it viable for agentic and tool-use workflows within its capability tier.

AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

Yes. Its combination of low per-token cost, fast response times, and context window of 16.4K tokens makes it well-suited for pipelines that process many documents in parallel.

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.