AI SDK 4.0

Introducing PDF support, computer use, and an xAI Grok provider

The AI SDK is an open-source toolkit for building AI applications with JavaScript and TypeScript. Its unified provider API allows you to use any language model and enables powerful UI integrations into leading web frameworks such as Next.js and Svelte.

Since our 3.4 release, we've seen the community build amazing products with the AI SDK:

Today, we're announcing the release of AI SDK 4.0. This version introduces several new capabilities including:

Let's explore these new features and improvements.

PDF support

Supporting PDF documents is essential for AI applications as this format is the standard way people share and store documents. Organizations and individuals have built up large collections of important documents in PDF format—from contracts and research papers to manuals and reports—which means AI systems need to handle PDFs well if they're going to help with analyzing documents, pulling out information, or automating workflows.

AI SDK 4.0 introduces PDF support across multiple providers, including Anthropic, Google Generative AI, and Google Vertex AI. With PDF support, you can:

  • Extract text and information from PDF documents

  • Analyze and summarize PDF content

  • Answer questions based on PDF content

To send a PDF to any compatible model, you can pass PDF files as part of the message content using the file type. Here’s an example with Anthropic’s Claude Sonnet 3.5 model:

import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
const result = await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What is an embedding model according to this document?',
},
{
type: 'file',
data: fs.readFileSync('./data/ai.pdf'),
mimeType: 'application/pdf',
},
],
},
],
});

Thanks to the AI SDK's unified API, to use this functionality with Google or Vertex AI, all you need to change is the model string in the code above.

The code example demonstrates how PDFs integrate seamlessly into your LLM calls. They're treated as just another message content type, requiring no special handling beyond including them as part of the message.

Check out our quiz generator template to see PDF support in action. Using useObject and Google's Gemini Pro 1.5 model, it generates an interactive multiple choice quiz based on the contents of a PDF that you upload.

Computer use support (Anthropic)

Enabling AI to interact with apps and interfaces naturally, instead of using special tools, unlocks new automation and assistance opportunities.

AI SDK 4.0 introduces computer use support for the latest Claude Sonnet 3.5 model, allowing you to build applications that can:

  • Control mouse movements and clicks

  • Input keyboard commands

  • Capture and analyze screenshots

  • Execute terminal commands

  • Manipulate text files

Anthropic provides three predefined tools designed to work with the latest Claude 3.5 Sonnet model: the Computer Tool for basic system control (mouse, keyboard, and screenshots), the Text Editor Tool for file operations, and the Bash Tool for terminal commands. These tools are carefully designed so that Claude knows exactly how to use them.

While Anthropic defines the tool interfaces, you'll need to implement the underlying execute function for each tool, defining how your application should handle actions like moving the mouse, capturing screenshots, or running terminal commands on your specific system.

Here's an example using the computer tool with generateText:

import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { executeComputerAction, getScreenshot } from '@/lib/ai'; // user-defined
const computerTool = anthropic.tools.computer_20241022({
displayWidthPx: 1920,
displayHeightPx: 1080,
execute: async ({ action, coordinate, text }) => {
switch (action) {
case 'screenshot': {
return {
type: 'image',
data: getScreenshot(),
};
}
default: {
return executeComputerAction(action, coordinate, text);
}
}
},
experimental_toToolResultContent: (result) => {
return typeof result === 'string'
? [{ type: 'text', text: result }]
: [{ type: 'image', data: result.data, mimeType: 'image/png' }];
},
});
const result = await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
prompt: 'Move the cursor to the center of the screen and take a screenshot',
tools: { computer: computerTool },
});

You can combine these tools with the AI SDK's maxSteps feature to enable more sophisticated workflows. By setting a maxSteps value, the model can make multiple consecutive tool calls without user intervention, continuing until it determines the task is complete. This is particularly powerful for complex automation tasks that require a sequence of different operations:

const result = await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
prompt: 'Summarize the AI news from this week.',
tools: { computer: computerTool, textEditor: textEditorTool, bash: bashTool },
maxSteps: 10,
});

Note that Anthropic computer use is currently in beta, and it's recommended to implement appropriate safety measures such as using virtual machines and limiting access to sensitive data when building applications with this functionality. To learn more, check out our computer use guide.

Continuation support

Many AI applications—from writing long-form content to generating code—require outputs that exceed the generation limits of language models. While these models can understand large amounts of content in their context window, they're typically limited in how much they can generate in a single response.

To address this common challenge, AI SDK 4.0 introduces continuation support, which detects when a generation is incomplete (i.e. when the finish reason is “length”) and continues the response across multiple steps, combining them into a single unified output. With continuation support, you can:

  • Generate text beyond standard output limits

  • Maintain coherence across multiple generations

  • Automatically handle word boundaries for clean output

  • Track combined token usage across steps

This feature works across all providers and can be used with both generateText and streamText by enabling the experimental_continueSteps setting. Here's an example of generating a long-form historical text:

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
const result = await generateText({
model: openai('gpt-4o'),
maxSteps: 5,
experimental_continueSteps: true,
prompt:
'Write a book about Roman history, ' +
'from the founding of the city of Rome ' +
'to the fall of the Western Roman Empire. ' +
'Each chapter MUST HAVE at least 1000 words.',
});

When using continuation support with streamText, the SDK ensures clean word boundaries by only streaming complete words. Both generateText and streamText may trim trailing tokens from some calls to prevent whitespace issues.

New xAI Grok provider

The AI SDK now supports x.AI through a new official provider. To use the provider, install the package:

pnpm install ai @ai-sdk/xai

You can then use the provider with all AI SDK Core methods. For example, here's how you can use it with generateText:

import { xai } from '@ai-sdk/xai';
import { generateText } from 'ai';
const { text } = await generateText({
model: xai('grok-beta'),
prompt: 'Write a vegetarian lasagna recipe for 4 people.',
});

For more information, please see the AI SDK xAI provider documentation.

Additional provider updates

We've expanded our provider support to offer more options and improve performance across the board:

Updated chatbot template

The Next.js AI Chatbot template has been updated, incorporating everything we've learned from building v0 and the latest framework advances. Built with Next.js 15, React 19, and Auth.js 5, this new version represents the most comprehensive starting point for production-grade AI applications.

The template ships with production-ready features, including a redesigned UI with model switching, persistent PostgreSQL storage, and much more. It showcases powerful patterns for generative UI and newer variations of the chat interface with interactive workspaces (like v0 blocks) that allow you to integrate industry specific workflows and tools to design hybrid AI-user collaborative experiences.

Try the AI Chatbot demo to see these features in action or deploy your own instance to start building sophisticated AI applications with battle-tested patterns and best practices.

Migrating to AI SDK 4.0

AI SDK 4.0 includes breaking changes that remove deprecated APIs. We've made the migration process easier with automated migration tools. You can run our automated codemods to handle the bulk of the changes.

For a detailed overview of all changes and manual steps that might be needed, refer to our AI SDK 4.0 migration guide. The guide includes step-by-step instructions and examples to ensure a smooth update.

Getting started

With new features like PDF, computer use, and the new xAI Grok provider, there's never been a better time to start building AI applications with the AI SDK.

Contributors

AI SDK 4.0 is the result of the combined work of our core team at Vercel (Lars, Jeremy, Walter, and Nico) and many community contributors. Special thanks for contributing merged pull requests:

minpeter, hansemannn, HarshitChhipa, skull8888888, nalaso, bhavya3024, gastonfartek, michaeloliverx, mauhai, yoshinorisano, Saran33, K-Mistele, MrHertal, h4r5h4, tonyfarney.

Your feedback and contributions are invaluable as we continue to evolve the AI SDK.x