Sonar
Sonar is Perplexity's lightweight search model. It combines language generation with built-in web search to deliver citation-backed answers at low cost within a context window of 127K tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'perplexity/sonar', prompt: 'Why is the sky blue?'})Playground
Try out Sonar by Perplexity. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Perplexity
| Model |
|---|
About Sonar
Sonar is the base tier of Perplexity's Sonar API family. It's built for developers who need web-grounded answers without building a retrieval pipeline. Unlike conventional language models that generate from training data alone, Sonar models run a live web search on every inference call and synthesize results into a cited response.
This built-in search is Perplexity's core differentiator. You don't need to configure a separate search API, build a RAG pipeline, or manage document indexing. The model handles query formulation, web retrieval, source evaluation, and answer synthesis in one request. Responses include inline citations pointing to sources, giving your application verifiable provenance for each claim.
Sonar targets high-volume use cases where cited answers matter but deep reasoning isn't required. It has the lowest per-token cost in the Sonar family. That makes it practical for customer-facing search, chatbots that need factual accuracy, and automated research assistants processing many queries. The context window of 127K tokens supports multi-turn conversations that accumulate search context across exchanges.
What To Consider When Choosing a Provider
- Configuration: Sonar runs a web search on every request. Response latency includes network round-trips to retrieve sources. Factor this into timeout budgets for latency-sensitive interfaces. Web search calls bill separately (N/A per thousand when listed for this model).
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Sonar
Best For
- Customer-facing search: Answers must cite current web sources
- High-volume factual Q&A: The lowest cost in the Sonar family, suitable for chatbots and support assistants
- Automated research pipelines: Gather and synthesize web information across many queries without separate search infrastructure
- Content verification workflows: Citation-backed responses provide auditable source trails
- Lightweight RAG replacement: No need to build and maintain a custom retrieval stack
Consider Alternatives When
- Deep multi-step reasoning: Sonar Reasoning Pro applies chain-of-thought before answering
- Thorough source analysis: Sonar Pro searches more sources and produces more detailed synthesis
- Purely generative work: Creative writing or code generation has no need for web grounding
- Offline environments: Air-gapped deployments can't make web search calls
Conclusion
Sonar delivers Perplexity's signature capability: live web search integrated directly into language model inference. It's the most accessible price point in the Sonar family. If you need cited answers at scale without building retrieval infrastructure, it turns any prompt into a web-informed response in a single call.
Frequently Asked Questions
How does Sonar's built-in web search work?
Every API call triggers a live web search. The model formulates search queries from your prompt, retrieves and evaluates web sources, then synthesizes the information into a response with inline citations. You don't need an external search API or RAG pipeline.
What is the difference between Sonar and Sonar Pro?
Sonar Pro searches more sources, produces longer and more detailed answers, and handles more complex multi-source queries. Sonar is optimized for speed and cost on straightforward factual queries.
Does Sonar include citations in its responses?
Yes. Responses include inline citations referencing the web sources used for each claim. This gives your application verifiable provenance.
What is the context window for Sonar?
127K tokens. This supports multi-turn conversations where search context accumulates across exchanges.
How much does Sonar cost?
Current pricing is shown on this page. AI Gateway routes across providers, and rates may vary by provider.
How do I authenticate with Sonar through AI Gateway?
Use your AI Gateway API key with the model identifier `
perplexity/sonar`. AI Gateway handles provider routing and authentication. You don't need a separate Perplexity API key when using gateway-managed access.Can I use Sonar without the web search feature?
No. Web search is integral to the Sonar architecture and runs on every request. If you need a model without search, use a general-purpose language model from another provider.