Menu

Streaming

Last updated December 1, 2025

The OpenResponses API supports streaming to receive tokens as they're generated instead of waiting for the complete response. Set stream: true in your request, then read the response body as a stream of server-sent events. Each event contains a chunk of the response that you can display incrementally.

stream.ts
const apiKey = process.env.AI_GATEWAY_API_KEY;
 
const response = await fetch('https://ai-gateway.vercel.sh/v1/responses', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${apiKey}`,
  },
  body: JSON.stringify({
    model: 'google/gemini-3-flash',
    input: [
      {
        type: 'message',
        role: 'user',
        content: 'Write a haiku about debugging code.',
      },
    ],
    stream: true,
  }),
});
 
const reader = response.body.getReader();
const decoder = new TextDecoder();
 
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
 
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
 
  for (const line of lines) {
    if (line.startsWith('data:')) {
      const data = line.substring(6).trim();
      if (data) {
        const event = JSON.parse(data);
        if (event.type === 'response.output_text.delta') {
          process.stdout.write(event.delta);
        }
      }
    }
  }
}
  • response.created - Response initialized
  • response.output_text.delta - Text chunk received
  • response.output_text.done - Text generation complete
  • response.completed - Full response complete with usage stats

Was this helpful?

supported.