Skip to content
Dashboard

Trinity Large Thinking

Trinity Large Thinking is a reasoning-focused variant in Arcee AI's Trinity Large family: a 398B-parameter sparse mixture-of-experts model with about 13B active parameters per token, built on Trinity Large Base and emphasizing extended chain-of-thought reasoning.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'arcee-ai/trinity-large-thinking',
prompt: 'Why is the sky blue?'
})

Playground

Try out Trinity Large Thinking by Arcee AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

arcee-ai logo
arcee-ai logo

Ask Trinity Large Thinking anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Arcee AI
262K
0.3s
$0.25/M$0.90/M——
04/01/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Arcee AI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
131K
0.3s
281tps
$0.04/M$0.15/M——
arcee-ai logo
12/01/2025
131K
$0.25/M$1.00/M——
arcee-ai logo
01/01/2025

About Trinity Large Thinking

Trinity Large Thinking is a 398B-parameter sparse mixture-of-experts model with about 13B active parameters per token. It is built on Trinity Large Base and emits intermediate reasoning in the output when the task calls for it.

Use it when you need audit-friendly, stepwise reasoning more than the shortest possible reply. Choose Trinity Large Preview when you do not need trace-heavy output.

Check https://docs.arcee.ai/language-models/trinity-large-thinking for the latest capabilities, limits, and rates. Reasoning depth trades off against speed and token count, so validate latency and cost on your prompts before you commit to architecture.

What To Consider When Choosing a Provider

  • Configuration: Reasoning traces add output tokens. Budget for longer completions, stream responses, and compare $0.25 per million input tokens and $0.9 per million output tokens to your cost model.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Trinity Large Thinking

Best For

  • Auditable enterprise workflows: Step-by-step reasoning you can log and review
  • Analytical error reduction: Tasks where visible intermediate steps lower error rates
  • Traceable code review: Debugging or refactoring where the model's steps run alongside your own review
  • Inspectable decision flows: Multi-step pipelines where each stage must be reviewable

Consider Alternatives When

  • Short single-turn replies: Trinity Large Preview answers faster with fewer tokens
  • Minimal output budget: Reasoning traces add tokens that may not fit your cost model
  • Cost-dominant workloads: Trinity Mini meets a lower price point when its quality bar is enough

Conclusion

Trinity Large Thinking adds trace-oriented, post-trained reasoning on top of Arcee AI's Trinity Large Base stack in AI Gateway. Choose it when auditable steps matter; choose Trinity Large Preview when you do not need that overhead.