Skip to content

GPT OSS 120B

openai/gpt-oss-20b

GPT OSS 120B is OpenAI's open-source 20-billion parameter language model, providing a lightweight yet capable open-weights option suitable for cost-efficient deployment.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'openai/gpt-oss-20b',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

At 20B parameters, GPT OSS 120B is much more practical to deploy on standard infrastructure compared to the 120B variant. It provides a good balance of capability and resource requirements.

Through AI Gateway you can use it immediately without provisioning GPU infrastructure.

When to Use GPT OSS 120B

Best For

  • Lightweight open-source deployment:

    Open-weight models with reasonable infrastructure requirements

  • Cost-efficient open-source:

    Applications that need open-weight transparency at lower compute cost

  • Edge deployment research:

    Exploring deployment of capable models in resource-constrained environments

  • General-purpose tasks:

    Chat, summarization, and content generation where 20B scale is sufficient

Consider Alternatives When

  • Higher capability needed:

    Gpt-oss-120b for stronger open-source performance

  • Maximum quality:

    GPT-5 or GPT-5.2 for higher capability from closed-source models

  • Specialized tasks:

    Codex models for coding, o-series for reasoning

  • Smallest possible model:

    GPT-5 nano or GPT-4.1 nano for minimal-cost inference

Conclusion

GPT OSS 120B provides a practical entry point to open-source language models from OpenAI, balancing capability with efficiency. Available through AI Gateway, it serves teams that need open weights without the infrastructure demands of larger models.

FAQ

It's more compact (20B vs 120B parameters), making it cheaper to run and easier to self-host, with correspondingly lower capability on complex tasks.

Chat, content generation, summarization, analysis, and other general-purpose language tasks where 20B parameter scale provides sufficient quality.

131.1K tokens.

AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway traffic.