GPT OSS 20B

GPT OSS 20B is OpenAI's smaller open-weight model with roughly 21 billion total parameters and 3.6 billion active per token, designed for low-latency, agentic, and on-device workloads.

ReasoningTool Use

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'openai/gpt-oss-20b',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

About GPT OSS 20B

GPT OSS 20B was released by OpenAI on August 5, 2025 under the Apache 2.0 license, alongside the larger gpt-oss-120b. Both are mixture-of-experts transformers using alternating dense and locally banded sparse attention, grouped multi-query attention, and rotary positional embeddings with native support for 131.1K tokens of context.

The 20B label refers to total parameter count — about 21 billion. Only roughly 3.6 billion parameters activate per token, which is what determines inference cost. OpenAI reports GPT OSS 20B matches or exceeds o3-mini on common evaluations and outperforms it on competition math (AIME) and HealthBench, while running on a single device with 16 GB of memory.

GPT OSS 20B supports adjustable reasoning levels (low, mid, high), native function calling, and structured outputs. OpenAI positions it as the recommended starting point for most workloads, with gpt-oss-120b available to escalate to on the hardest reasoning steps.

Through AI Gateway, you reach GPT OSS 20B with a single API key, route to bedrock, fireworks, groq, deepinfra, togetherai, novita, parasail as needed, and read live throughput and latency from this page. No GPU provisioning, no separate provider account.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

GPT OSS 20B

About GPT OSS 20B