GPT OSS Safeguard 20B
GPT OSS Safeguard 20B is a 20-billion parameter open-source safety model from OpenAI, designed to classify and filter content for harmful or policy-violating material in AI application pipelines.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-oss-safeguard-20b', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
GPT OSS Safeguard 20B is not a general-purpose language model. It's a specialized classifier designed to evaluate content for safety and policy compliance. Deploy it as a filter layer alongside generation models.
Open weights let you inspect the safety criteria and audit the model's behavior.
When to Use GPT OSS Safeguard 20B
Best For
Content moderation:
Classifying user inputs and model outputs for harmful or policy-violating material
Safety guardrails:
Adding a dedicated safety layer to AI application pipelines
Policy enforcement:
Ensuring AI-generated content meets organizational or regulatory standards
Pre-screening pipelines:
Filtering inputs before they reach generation models
Consider Alternatives When
General-purpose tasks:
Any GPT model for chat, generation, or analysis tasks
Built-in safety:
Many GPT models include safety measures natively; use this model when you need an additional dedicated layer
Proprietary moderation:
OpenAI's moderation endpoint for a managed, non-open-source alternative
Different model scales:
Consider whether the 20B parameter scale is sufficient for your moderation needs
Conclusion
GPT OSS Safeguard 20B adds a transparent, customizable safety layer to AI application pipelines. As an open-source safety classifier available through AI Gateway, it enables teams to implement and audit content safety measures with full visibility into the model's behavior.
FAQ
No. It's a specialized safety classifier that evaluates content for harmful or policy-violating material. Use it alongside general-purpose models as a guardrail layer.
It evaluates text for categories of harmful content and policy violations. The specific categories are documented in the model's open-source materials.
Open weights let teams inspect and adapt the classifier for organization-specific policies and safety requirements.
Deploy it as a filter layer that evaluates inputs before they reach your generation model and/or evaluates outputs before they reach users. Route requests through AI Gateway.
AI Gateway accepts a single API key or OIDC token for all requests. You don't embed OpenAI credentials in your application; AI Gateway routes and authenticates on your behalf.
This page shows live performance metrics measured across real AI Gateway traffic.