Break It, Fix It
Your email plugin from Build an Email Plugin works when everything goes right. But networks fail. APIs go down. Rate limits hit. Invalid credentials slip through. What happens then?
Vercel Workflow gives you explicit control: throw RetryableError for transient failures that should retry automatically, or FatalError for permanent failures that need immediate attention. You decide what retries and what stops.
RetryableError retries automatically with exponential backoff. FatalError stops immediately. You control which errors get which treatment. See Errors & Retrying in the SDK docs.
Outcome
You'll break your email plugin on purpose, see the limitation of simple return values, then refactor to proper error handling with FatalError and RetryableError. The goal: understand why the SDK provides these error types.
Fast Track
- Break the email plugin with a bad API key - see the return-value limitation
- Refactor to
FatalErrorfor auth failures,RetryableErrorfor transient ones - Fix and verify the improved error handling
Hands-on Exercise
Predict the Output
Before you break it: If you set an invalid API key and run the workflow, how many times do you think the step will attempt? Will it retry forever, or stop after some limit? What will the logs show?
Part 1: Break It (See the Problem)
Your step from Build an Email Plugin returns { success: false, error: "..." } when things go wrong. Let's see what happens:
- Open
.env.localand setRESEND_API_KEY=invalid_key_12345 - Run your Send Email workflow
- Check the Runs tab output:
{
"error": "API key is invalid",
"success": false
}One attempt. Failed. Done. The workflow has no idea this was an auth error vs a rate limit vs a network blip. It just sees "failed" and stops.
The problem: Your step returns failure, but doesn't tell the workflow how to handle it. Should it retry? Alert immediately? The workflow can't decide because you haven't told it.
Part 2: Refactor to Throw Errors
Let's upgrade your step to use proper Workflow SDK error types. Update plugins/resend/steps/send-email.ts:
import { FatalError } from "workflow";
async function stepHandler(
input: SendEmailCoreInput,
credentials: ResendCredentials
): Promise<SendEmailResult> {
const apiKey = credentials.RESEND_API_KEY;
if (!apiKey) {
throw new FatalError("RESEND_API_KEY is not configured");
}
const resend = new Resend(apiKey);
const result = await resend.emails.send({
from: "onboarding@resend.dev",
to: input.emailTo,
subject: input.emailSubject,
text: input.emailBody,
});
if (result.error) {
// Auth errors are permanent - don't retry
if (result.error.message.includes("API key")) {
throw new FatalError(`Auth failed: ${result.error.message}`);
}
// Other errors might be transient - return failure for now
return { success: false, error: result.error.message };
}
return { success: true, id: result.data?.id || "" };
}Run with the invalid key again:
[Workflow Executor] Node execution completed: { nodeId: 'action-1', success: false }
Still one attempt, but now the error is a FatalError - the workflow knows this is permanent and won't waste time retrying.
Part 3: Add Retry for Transient Errors
Now let's handle the opposite case - errors that should retry. Rate limits (429) and service unavailable (503) are temporary. Add RetryableError:
import { FatalError, RetryableError } from "workflow";
async function stepHandler(
input: SendEmailCoreInput,
credentials: ResendCredentials
): Promise<SendEmailResult> {
// ... apiKey check with FatalError ...
const resend = new Resend(apiKey);
const result = await resend.emails.send({ ... });
if (result.error) {
const msg = result.error.message;
// Transient errors - retry with backoff
if (msg.includes("rate limit") || msg.includes("503")) {
throw new RetryableError(`Temporary failure: ${msg}`);
}
// Auth errors - don't retry
if (msg.includes("API key")) {
throw new FatalError(`Auth failed: ${msg}`);
}
return { success: false, error: msg };
}
return { success: true, id: result.data?.id || "" };
}To see retries in action, you can temporarily force a RetryableError at the start of your step. The workflow will retry with exponential backoff until it succeeds or hits the retry limit.
Part 4: Fix and Verify
- Restore your valid
RESEND_API_KEYin.env.local - Run the workflow
- Watch it succeed on first attempt
- Check the Runs tab - you should see
{ "success": true, "id": "..." }
When to Use Which
| Error Type | When to Use | Examples |
|---|---|---|
RetryableError | Transient failures that might resolve | 429 rate limit, 503 service unavailable, network timeout |
FatalError | Permanent failures that won't self-resolve | 401 unauthorized, 400 bad request, invalid input data |
A bad API key won't become valid after 3 retries. Make auth failures fatal immediately — you'll get alerted faster and won't waste resources.
In production, workflow errors show up in Vercel Runtime Logs. Set up Log Drains to pipe them to your observability stack, and configure Alerts to get notified when fatal errors spike.
Your step gets a 503 Service Unavailable from an external API. Which error type?
Your step gets a 401 Unauthorized. Which error type?
Design Your Error Strategy
Think about an API you use regularly (Stripe, Twilio, GitHub, your internal services). List 2-3 error responses that API returns. For each one, would you use RetryableError or FatalError? Why?
Try It
Check the Runs tab after each test:
1. Before refactor (return pattern) - invalid API key:
{
"error": "API key is invalid",
"success": false
}One attempt. Workflow doesn't know if it should retry.
2. After adding FatalError - invalid API key:
{
"error": "Auth failed: API key is invalid",
"success": false
}Still one attempt, but now it's explicit - workflow knows not to retry auth failures.
3. After fixing - valid API key:
{
"id": "1b588f42-6550-469b-b3af-2b422ac51993",
"success": true
}Success on first attempt. Email delivered.
RetryableError accepts a retryAfter option for precise control over when to retry. You can specify a duration string ("5m"), milliseconds (5000), or a specific Date. Combined with getStepMetadata() for attempt counts, you can implement exponential backoff or honor Retry-After headers from APIs. See the RetryableError docs for examples.
Commit
git add -A
git commit -m "feat: add error handling with RetryableError and FatalError"Done
- Broke email plugin with invalid API key
- Saw the limitation of return-value error pattern
- Refactored to throw
FatalErrorfor auth failures - Added
RetryableErrorfor transient failures (rate limits, 503) - Fixed everything, verified successful send
- Can explain when to use RetryableError vs FatalError
Was this helpful?