Break It, Fix It

Your email plugin from Build an Email Plugin works when everything goes right. But networks fail. APIs go down. Rate limits hit. Invalid credentials slip through. This is exactly why the stripped-down starter teaches the pattern first instead of pretending built-in integrations already solve it. What happens when your own plugin hits those failures?

Vercel Workflow gives you explicit control: throw RetryableError for transient failures that should retry automatically, or FatalError for permanent failures that need immediate attention. You decide what retries and what stops.

Mental Model: Retry and Error Recovery

RetryableError retries automatically with exponential backoff. FatalError stops immediately. You control which errors get which treatment. See Errors & Retrying in the SDK docs.

Outcome

You'll break your email plugin on purpose, see the limitation of simple return values, then refactor to proper error handling with FatalError and RetryableError. The goal: understand why the SDK provides these error types.

Fast Track

Break the email plugin with a bad API key - see the return-value limitation
Refactor to FatalError for auth failures, RetryableError for transient ones
Fix and verify the improved error handling

Hands-on Exercise

Reflection Prompt

Predict the Output

Before you break it: If you set an invalid API key and run the workflow, how many times do you think the step will attempt? Will it retry forever, or stop after some limit? What will the logs show?

Part 1: Break It (See the Problem)

Your step from Build an Email Plugin returns { success: false, error: "..." } when things go wrong. Let's see what happens:

Open .env.local and set RESEND_API_KEY=invalid_key_12345
Run your Send Email workflow
Check the Runs tab output:

{
  "error": "API key is invalid",
  "success": false
}

One attempt. Failed. Done. The workflow has no idea this was an auth error vs a rate limit vs a network blip. It just sees "failed" and stops.

The problem: Your step returns failure, but doesn't tell the workflow how to handle it. Should it retry? Alert immediately? The workflow can't decide because you haven't told it.

Part 2: Refactor to Throw Errors

Let's upgrade your step to use proper Workflow SDK error types. Update plugins/resend/steps/send-email.ts:

plugins/resend/steps/send-email.ts

import { FatalError } from "workflow";
 
async function stepHandler(
  input: SendEmailCoreInput,
  credentials: ResendCredentials
): Promise<SendEmailResult> {
  const apiKey = credentials.RESEND_API_KEY;
 
  if (!apiKey) {
    throw new FatalError("RESEND_API_KEY is not configured");
  }
 
  const resend = new Resend(apiKey);
  const result = await resend.emails.send({
    from: "onboarding@resend.dev",
    to: input.emailTo,
    subject: input.emailSubject,
    text: input.emailBody,
  });
 
  if (result.error) {
    // Auth errors are permanent - don't retry
    if (result.error.message.includes("API key")) {
      throw new FatalError(`Auth failed: ${result.error.message}`);
    }
    // Other errors might be transient - return failure for now
    return { success: false, error: result.error.message };
  }
 
  return { success: true, id: result.data?.id || "" };
}

Run with the invalid key again:

[Workflow Executor] Node execution completed: { nodeId: 'action-1', success: false }

Still one attempt, but now the error is a FatalError - the workflow knows this is permanent and won't waste time retrying.

Part 3: Add Retry for Transient Errors

Now let's handle the opposite case - errors that should retry. Rate limits (429) and service unavailable (503) are temporary. Add RetryableError:

plugins/resend/steps/send-email.ts

import { FatalError, RetryableError } from "workflow";
 
async function stepHandler(
  input: SendEmailCoreInput,
  credentials: ResendCredentials
): Promise<SendEmailResult> {
  // ... apiKey check with FatalError ...
 
  const resend = new Resend(apiKey);
  const result = await resend.emails.send({ ... });
 
  if (result.error) {
    const msg = result.error.message;
    
    // Transient errors - retry with backoff
    if (msg.includes("rate limit") || msg.includes("503")) {
      throw new RetryableError(`Temporary failure: ${msg}`);
    }
    
    // Auth errors - don't retry
    if (msg.includes("API key")) {
      throw new FatalError(`Auth failed: ${msg}`);
    }
    
    return { success: false, error: msg };
  }
 
  return { success: true, id: result.data?.id || "" };
}

Testing Retries

To see retries in action, you can temporarily force a RetryableError at the start of your step. The workflow will retry with exponential backoff until it succeeds or hits the retry limit.

Part 4: Fix and Verify

Restore your valid RESEND_API_KEY in .env.local
Run the workflow
Watch it succeed on first attempt
Check the Runs tab - you should see { "success": true, "id": "..." }

When to Use Which

Loading diagram...

Error Type	When to Use	Examples
`RetryableError`	Transient failures that might resolve	429 rate limit, 503 service unavailable, network timeout
`FatalError`	Permanent failures that won't self-resolve	401 unauthorized, 400 bad request, invalid input data

Don't Retry Auth Failures

A bad API key won't become valid after 3 retries. Make auth failures fatal immediately — you'll get alerted faster and won't waste resources.

Production Observability

In production, workflow errors show up in Vercel Runtime Logs. Set up Log Drains to pipe them to your observability stack, and configure Alerts to get notified when fatal errors spike.

Question

Your step gets a 503 Service Unavailable from an external API. Which error type?

Question

Your step gets a 401 Unauthorized. Which error type?

Reflection Prompt

Design Your Error Strategy

Think about an API you use regularly (Stripe, Twilio, GitHub, your internal services). List 2-3 error responses that API returns. For each one, would you use RetryableError or FatalError? Why?

Try It

Check the Runs tab after each test:

1. Before refactor (return pattern) - invalid API key:

{
  "error": "API key is invalid",
  "success": false
}

One attempt. Workflow doesn't know if it should retry.

2. After adding FatalError - invalid API key:

{
  "error": "Auth failed: API key is invalid",
  "success": false
}

Still one attempt, but now it's explicit - workflow knows not to retry auth failures.

3. After fixing - valid API key:

{
  "id": "1b588f42-6550-469b-b3af-2b422ac51993",
  "success": true
}

Success on first attempt. Email delivered.

Advanced: Custom Retry Timing

RetryableError accepts a retryAfter option for precise control over when to retry. You can specify a duration string ("5m"), milliseconds (5000), or a specific Date. Combined with getStepMetadata() for attempt counts, you can implement exponential backoff or honor Retry-After headers from APIs. See the RetryableError docs for examples.

Commit

git add -A
git commit -m "feat: add error handling with RetryableError and FatalError"

Done

Broke email plugin with invalid API key
Saw the limitation of return-value error pattern
Refactored to throw FatalError for auth failures
Added RetryableError for transient failures (rate limits, 503)
Fixed everything, verified successful send
Can explain when to use RetryableError vs FatalError