Skip to content

GPT-3.5 Turbo

GPT-3.5 Turbo first brought ChatGPT-class conversational AI to the API at scale, delivering the same underlying capability at a price point that opened the door to production applications.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-3.5-turbo',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What API format does GPT-3.5 Turbo use?

    It uses the Chat Completions API format. You send an array of messages with roles (system, user, assistant) rather than a raw completion prompt string.

  • How does GPT-3.5 Turbo differ from GPT-3.5 Turbo Instruct?

    GPT-3.5 Turbo is built for the Chat Completions endpoint and conversational multi-turn use. GPT-3.5 Turbo Instruct targets the legacy Completions endpoint and is optimized for single-turn instruction tasks using a prompt-response format.

  • What context window does GPT-3.5 Turbo support?

    The current model supports a context window of 16.4K tokens, suitable for multi-turn conversations and moderate-length document tasks.

  • Is GPT-3.5 Turbo suitable for function calling?

    Yes. OpenAI added function-calling support, enabling developers to define external functions the model can invoke, making it viable for agentic and tool-use workflows within its capability tier.

  • How does AI Gateway handle authentication for GPT-3.5 Turbo?

    AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

  • Can I use GPT-3.5 Turbo for batch summarization pipelines?

    Yes. Its combination of low per-token cost, fast response times, and context window of 16.4K tokens makes it well-suited for pipelines that process many documents in parallel.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.