Extraction - Your First AI Script
Now that you've learned some theory and got your project setup, it's time to ship some code. You will build and run a script that extracts info from text using the AI SDK's generateText
method. This will show you firsthand how tweaking your prompt or swapping models instantly changes your results.
How Your Script Works
Analyzing the Starter Script
Open your project code. Look for app/(1-extraction)/extraction.ts
and essay.txt
.
Update the contents of extraction.ts
with this code that extracts names from the essay:
import dotenvFlow from 'dotenv-flow';
dotenvFlow.config(); // Load environment variables (API keys, etc.)
import fs from 'fs';
import { generateText } from 'ai'; // AI SDK's core text generation function
// Read the essay file that we'll extract names from
const essay = fs.readFileSync('app/(1-extraction)/essay.txt', 'utf-8');
async function main() {
// Call the LLM with our extraction prompt
const result = await generateText({
model: 'openai/gpt-4.1', // The model to use (could swap for gpt-5, claude-3-5-sonnet, etc.)
prompt: `Extract all the names mentioned in this essay. List them separated by commas.
Essay:
${essay}`, // Instruction + the actual essay content
});
// The AI's response is in result.text
console.log('\n--- AI Response ---');
console.log(result.text); // This will be something like: "John Smith, Jane Doe, ..."
console.log('-------------------');
}
// Run the async function and catch any errors
main().catch(console.error);
Run Your First AI Script!
From your terminal, run:
pnpm extraction
You'll see the AI extracting names from the essay. Your first feature works. Nice!
--- AI Response ---
Here are all the names mentioned in the essay, separated by commas:
Brian Chesky, Ron Conway, Steve Jobs, John Sculley
-------------------
Verification Task
Check app/(1-extraction)/essay.txt
and use search (Cmd+F/Ctrl+F) to verify
the names. Did the AI nail it or miss some?
Understanding Token Usage
LLMs process text as 'tokens' (~4 chars each). Understanding tokens helps optimize speed and cost:
- Visualize tokenization at tiktokenizer.vercel.app
- Count tokens programmatically with
tiktoken
:pnpm add tiktoken
- Monitor usage to estimate costs and stay within context limits
Try pasting different prompts into Tiktokenizer to see surprising patterns (spaces matter!).
Iteration is Everything
Running the script once is just the start. Working with LLMs is all about iteration. Play with the prompt and see for yourself:
Challenge 1: Prompt Engineering – Change the Task
- Task: Swap the prompt to the following:
// Inside the prompt backticks:
What is the key takeaway of this piece in 50 words?
Essay:
${essay}
- Action: Save and re-run
pnpm extraction
- Observe: See how one prompt change completely transforms what your app does
Challenge 2: Model Swapping – Upgrade the Brain
- Task: Keep the summary prompt but change the model using the following code block:
// Change this line:
model: 'openai/gpt-5',
- Action: Save and run again
- Observe: Compare results. Better quality? Worth the extra cost/time?
Model Selection Guide
Available Models via Vercel AI Gateway:
OpenAI:
openai/gpt-5
- Most capable for complex reasoningopenai/gpt-4.1
- Fast & cost-effective for most tasks (non-reasoning)openai/gpt-5-nano
- Fastest for simple tasksopenai/gpt-4.1-mini
- Previous generation, still capable (non-reasoning)
Anthropic:
anthropic/claude-sonnet-4
- Strong reasoning & analysis
Google:
google/gemini-2.5-pro
- Advanced multimodal capabilitiesgoogle/gemini-2.5-flash
- Fast responses, good balancegoogle/gemini-2.5-flash-lite
- Lightweight & quickgoogle/gemini-2.0-flash
- Previous flash version
See the Vercel AI Gateway models for pricing & details, or the OpenAI models documentation for OpenAI-specific info.
Simply swap the model string to experiment - the AI SDK handles all the provider differences for you!
🧩 Side Quest
Extraction Expert
🧩 Side Quest
Advanced Prompt Engineering
🧩 Side Quest
Streaming Extraction Pipeline
Real-World Applications
This simple extraction pattern powers serious production features like:
- Content Moderation: Finding problematic content
- Research Tools: Pulling key data from papers
- Data Pipelines: Converting messy text to clean data
- Compliance Systems: Identifying PII/sensitive info
It's the same pattern: send content + instructions, process the response.
Key things to remember
generateText
= your basic AI workhorse- The
prompt
= what guides the AI - The
model
= power/speed/cost tradeoff - Iteration = the key to success
Further Reading (Optional)
- AI SDK Documentation: Official documentation for the core function we used in this lesson. Explore all parameters and options available.
- Tiktokenizer: Interactive tokenization visualizer built with Next.js. See exactly how your text breaks down into tokens across different models. (Open source on GitHub)
- Prompt Engineering Guide: Explore advanced prompting techniques to further improve your AI interactions beyond the basics covered in this lesson.
- Vercel AI Gateway Model Library: Understand the capabilities, strengths, cost, and trade-offs of different models to make informed choices for your applications.
What surprised you most when changing prompts vs models? How does this hands-on experience change how you think about working with AI?
What's Next: Model Types and Performance
You've built your first AI script and experienced the power of prompt engineering. In the next lesson, you'll learn about different model types and their performance characteristics. Understanding when to use fast models vs reasoning models is crucial for building AI features that deliver the right user experience.
After that, you'll be ready for "invisible AI" - behind-the-scenes features that enhance your product's UX using the patterns you've learned here.