What API format does GPT-3.5 Turbo use?

It uses the Chat Completions API format. You send an array of messages with roles (`system`, `user`, `assistant`) rather than a raw completion prompt string.

How does GPT-3.5 Turbo differ from GPT-3.5 Turbo Instruct?

GPT-3.5 Turbo is built for the Chat Completions endpoint and conversational multi-turn use. GPT-3.5 Turbo Instruct targets the legacy Completions endpoint and is optimized for single-turn instruction tasks using a prompt-response format.

What context window does GPT-3.5 Turbo support?

The current model supports a context window of 16.4K tokens, suitable for multi-turn conversations and moderate-length document tasks.

Is GPT-3.5 Turbo suitable for function calling?

Yes. OpenAI added function-calling support, enabling developers to define external functions the model can invoke, making it viable for agentic and tool-use workflows within its capability tier.

How does AI Gateway handle authentication for GPT-3.5 Turbo?

AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

Can I use GPT-3.5 Turbo for batch summarization pipelines?

Yes. Its combination of low per-token cost, fast response times, and context window of 16.4K tokens makes it well-suited for pipelines that process many documents in parallel.

What are typical latency characteristics?

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

GPT-3.5 Turbo

GPT-3.5 Turbo first brought ChatGPT-class conversational AI to the API at scale, delivering the same underlying capability at a price point that opened the door to production applications.

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'openai/gpt-3.5-turbo',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

What API format does GPT-3.5 Turbo use?
It uses the Chat Completions API format. You send an array of messages with roles (system, user, assistant) rather than a raw completion prompt string.
How does GPT-3.5 Turbo differ from GPT-3.5 Turbo Instruct?
GPT-3.5 Turbo is built for the Chat Completions endpoint and conversational multi-turn use. GPT-3.5 Turbo Instruct targets the legacy Completions endpoint and is optimized for single-turn instruction tasks using a prompt-response format.
What context window does GPT-3.5 Turbo support?
The current model supports a context window of 16.4K tokens, suitable for multi-turn conversations and moderate-length document tasks.
Is GPT-3.5 Turbo suitable for function calling?
Yes. OpenAI added function-calling support, enabling developers to define external functions the model can invoke, making it viable for agentic and tool-use workflows within its capability tier.
How does AI Gateway handle authentication for GPT-3.5 Turbo?
AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.
Can I use GPT-3.5 Turbo for batch summarization pipelines?
Yes. Its combination of low per-token cost, fast response times, and context window of 16.4K tokens makes it well-suited for pipelines that process many documents in parallel.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GPT-3.5 Turbo

Frequently Asked Questions