Google and Vertex Reasoning
The Gemini 2.5, 3, and 3.1 series models use an internal "thinking process" that improves their reasoning and multi-step planning abilities, making them effective for complex tasks like coding, advanced mathematics, and data analysis.
These models are available through both Google AI and Google Vertex AI providers. The thinking configuration is the same — the only difference is using providerOptions.vertex instead of providerOptions.google. To route through Vertex, configure Vertex AI credentials and set the provider order to prefer vertex.
- Gemini 3 and 3.1: Use
thinkingLevelto control the depth of reasoning - Gemini 2.5: Use
thinkingBudgetto set a token limit for thinking
google/gemini-3.1-pro-previewgoogle/gemini-3.1-flash-lite-previewgoogle/gemini-3-flashgoogle/gemini-2.5-progoogle/gemini-2.5-flashgoogle/gemini-2.5-flash-lite
The thinkingLevel parameter controls reasoning behavior. Not all levels are available on every model:
| Thinking level | Gemini 3.1 Pro | Gemini 3.1 Flash-Lite | Gemini 3 Flash | Description |
|---|---|---|---|---|
minimal | Not supported | Default | Supported | Matches "no thinking" for most queries. The model may still think minimally for complex coding tasks. Best for latency-sensitive workloads. |
low | Supported | Supported | Supported | Minimizes latency and cost. Best for simple instruction following and chat. |
medium | Supported | Supported | Supported | Balanced thinking for most tasks. |
high | Default | Supported | Default | Maximizes reasoning depth. The model may take significantly longer to reach a first output token. |
The thinkingBudget parameter sets a specific number of thinking tokens. Set thinkingBudget to 0 to disable thinking, or -1 to enable dynamic thinking (the model adjusts based on request complexity).
Use thinkingLevel with Gemini 3 and 3.1 models. While thinkingBudget is accepted for backwards compatibility, using it with Gemini 3 models may result in unexpected performance.
| Model | Default | Range | Disable thinking | Dynamic thinking |
|---|---|---|---|---|
| Gemini 2.5 Pro | Dynamic | 128–32,768 | Not supported | thinkingBudget: -1 (default) |
| Gemini 2.5 Flash | Dynamic | 0–24,576 | thinkingBudget: 0 | thinkingBudget: -1 (default) |
| Gemini 2.5 Flash Lite | Off | 512–24,576 | thinkingBudget: 0 | thinkingBudget: -1 |
Use the thinkingLevel parameter to control the depth of reasoning:
import { generateText } from 'ai';
const result = await generateText({
model: 'google/gemini-3.1-pro-preview',
prompt: 'What is the sum of the first 10 prime numbers?',
providerOptions: {
vertex: { // use vertex or google
thinkingConfig: {
thinkingLevel: 'high',
includeThoughts: true,
},
},
},
});
console.log(result.text);
console.log(result.reasoningText);Use the thinkingBudget parameter to control the number of thinking tokens:
import { generateText } from 'ai';
const result = await generateText({
model: 'google/gemini-2.5-flash',
prompt: 'What is the sum of the first 10 prime numbers?',
providerOptions: {
vertex: { // use vertex or google
thinkingConfig: {
thinkingBudget: 8192,
includeThoughts: true,
},
},
},
});
console.log(result.text);
console.log(result.reasoningText);When streaming, thinking tokens are emitted as reasoning-delta stream parts:
import { streamText } from 'ai';
const result = streamText({
model: 'google/gemini-2.5-flash',
prompt: 'Explain quantum computing in simple terms.',
providerOptions: {
vertex: { // use vertex or google
thinkingConfig: {
thinkingBudget: 2048,
includeThoughts: true,
},
},
},
});
for await (const part of result.fullStream) {
if (part.type === 'reasoning-delta') {
process.stdout.write(part.text);
} else if (part.type === 'text-delta') {
process.stdout.write(part.text);
}
}| Parameter | Type | Description |
|---|---|---|
thinkingLevel | string | Depth of reasoning: 'minimal', 'low', 'medium', 'high' |
includeThoughts | boolean | Include thinking content in the response |
| Parameter | Type | Description |
|---|---|---|
thinkingBudget | number | Maximum number of tokens to allocate for thinking |
includeThoughts | boolean | Include thinking content in the response |
For more details, see the Google AI thinking docs and Vertex AI thinking docs.
Was this helpful?