Skip to content
Dashboard

Trinity Large Preview

Trinity Large Preview is a 400B-parameter sparse mixture-of-experts model from Arcee AI that activates 13B parameters per forward pass, targeting math, coding, and multi-step agent workloads across a context window of 131K tokens.

Tool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'arcee-ai/trinity-large-preview',
prompt: 'Why is the sky blue?'
})

Playground

Try out Trinity Large Preview by Arcee AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

arcee-ai logo
arcee-ai logo

Ask Trinity Large Preview anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Arcee AI
131K
$0.25/M$1.00/M——
01/01/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Arcee AI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
262K
0.3s
$0.25/M$0.90/M——
arcee-ai logo
04/01/2026
131K
0.3s
281tps
$0.04/M$0.15/M——
arcee-ai logo
12/01/2025

About Trinity Large Preview

Trinity Large Preview is a sparse mixture-of-experts model with 400B total parameters and 13B active per forward pass. The MoE design keeps per-token compute proportional to the active parameter count while the full parameter space retains broad knowledge.

Arcee AI targets math, coding, and multi-step agent workflows with this release. Extended multi-turn sessions stay efficient because only the active expert subset runs on each token.

This release is labeled preview, so treat behavior, pricing, and versioning as subject to change while you benchmark. Use https://docs.arcee.ai/language-models/trinity-large-400b for the latest specs and rates from Arcee AI.

What To Consider When Choosing a Provider

  • Configuration: Stream long analytical outputs to improve time-to-first-token. At $0.25 per million input tokens and $1 per million output tokens, compare spend to your latency budget before you scale traffic.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Trinity Large Preview

Best For

  • Multi-step agent pipelines: The model plans, calls tools, and synthesizes results over many turns
  • Sustained math and coding: Tasks that need continuous reasoning across a long context
  • Large-scale code work: Generation or debugging where the model must follow logic across large files or refactors
  • Long-context analysis: Ingesting a large corpus and producing structured conclusions

Consider Alternatives When

  • Short single-turn requests: A smaller or faster model may match quality at lower cost
  • Generally available contract: Preview terms don't fit teams needing a fixed long-term API contract
  • Latency-dominant tasks: Simpler models suffice when deep multi-step reasoning isn't required

Conclusion

Trinity Large Preview brings Arcee AI's large MoE stack to AI Gateway for agentic, math, and coding workloads. If you need long-context reasoning and can accept preview terms, run it through your own benchmarks on AI Gateway.