Chat Completions
Create chat completions using various AI models available through the AI Gateway.
Create a non-streaming chat completion.
Create a streaming chat completion that streams tokens as they are generated.
Streaming responses are sent as Server-Sent Events (SSE), a web standard for real-time data streaming over HTTP. Each event contains a JSON object with the partial response data.
The response format follows the OpenAI streaming specification:
Key characteristics:
- Each line starts with followed by JSON
- Content is delivered incrementally in the field
- The stream ends with
- Empty lines separate events
SSE Parsing Libraries:
If you're building custom SSE parsing (instead of using the OpenAI SDK), these libraries can help:
- JavaScript/TypeScript: - Robust SSE parsing with support for partial events
- Python: - SSE support for HTTPX, or for requests
For more details about the SSE specification, see the W3C specification.
Send images as part of your chat completion request.
Send PDF documents as part of your chat completion request.
The chat completions endpoint supports the following parameters:
- (string): The model to use for the completion (e.g., )
- (array): Array of message objects with and fields
- (boolean): Whether to stream the response. Defaults to
- (number): Controls randomness in the output. Range: 0-2
- (integer): Maximum number of tokens to generate
- (number): Nucleus sampling parameter. Range: 0-1
- (number): Penalty for frequent tokens. Range: -2 to 2
- (number): Penalty for present tokens. Range: -2 to 2
- (string or array): Stop sequences for the generation
- (array): Array of tool definitions for function calling
- (string or object): Controls which tools are called (, , or specific function)
- (object): Provider routing and configuration options
- (object): Controls the format of the model's response
- For OpenAI standard format:
- For legacy format:
- For plain text:
- See Structured outputs for detailed examples
Messages support different content types:
Was this helpful?