Securing your AI applications with Rate Limiting

When building an AI application, one of the biggest concerns is abuse – where bad actors would exploit your API endpoints and incur excessive usage costs for your application.

This guide is a comprehensive walkthrough on how you can set up rate limiting using the Vercel AI SDK and Vercel WAF, allowing you to build powerful AI experiences with a peace of mind.

Why do you need rate limiting?

Rate limiting is a method used to regulate network traffic by defining a maximum number of requests that a client can send to a server within a given time frame.

Maintain Service Availability: By implementing rate limiting, you can shield your services from being inundated with too many requests. This control over request volume helps in sustaining the peak performance of your service, guaranteeing its continuous availability.
Manage Costs Effectively: Through rate limiting, you can keep a check on and regulate your billing expenses by averting unexpected surges in usage. This is particularly vital when dealing with services that bill per request.
Safeguard Against Malicious Activities: Utilizing rate limiting is crucial when working with AI providers and Large Language Models (LLMs). It acts as a defense mechanism against malicious activities or misuse, such as DDoS assaults.
Implement Usage Tiers Based on Subscription Plans: Rate limiting enables the establishment of different usage levels. For instance, free users may be restricted to a specific number of requests each day, whereas premium users may be granted a more generous limit.

What is Vercel?

Vercel's frontend cloud gives developers frameworks, workflows, and infrastructure to build a faster, more personalized web.

We are the creators of Next.js, the React framework, and have zero-configuration support for all major frontend frameworks.

Vercel WAF

Vercel WAF allows you to monitor and control the internet traffic to your site through IP blocking, custom rules and managed rulesets.

Vercel AI SDK

The Vercel AI SDK is an open-source library designed to help developers build conversational streaming user interfaces in JavaScript and TypeScript.

With the Vercel AI SDK, you can build beautiful streaming experiences similar to ChatGPT in just a few lines of code.

Implement Rate Limiting for your AI application

Step 1: Adding Vercel AI SDK

Inside your Next.js application, create a page.tsx file inside the App Router (app/) and add the following code:

app/page.tsx

'use client'

import { useChat } from 'ai/react'
import { toast } from 'sonner' // this can be any toast library of your choice

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    onError: err => {
      toast.error(err.message)
    }
  })

  return (
    <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
      {messages.length > 0
        ? messages.map(m => (
            <div key={m.id} className="whitespace-pre-wrap">
              {m.role === 'user' ? 'User: ' : 'AI: '}
              {m.content}
            </div>
          ))
        : null}

      <form onSubmit={handleSubmit}>
        <input
          className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
          value={input}
          placeholder="Say something..."
          onChange={handleInputChange}
        />
      </form>
    </div>
  )
}

Then, create a Route Handler to stream in your chat response from OpenAI:

api/chat/route.ts

import { Configuration, OpenAIApi } from 'openai-edge'
import { OpenAIStream, StreamingTextResponse } from 'ai'

// Create an OpenAI API client (that's edge friendly!)
const config = new Configuration({
  apiKey: process.env.OPENAI_API_KEY
})
const openai = new OpenAIApi(config)

// Set the runtime to edge to allow for Edge Streaming (https://vercel.fyi/streaming)
export const runtime = 'edge'

export async function POST(req: Request) {

  // Extract the `prompt` from the body of the request
  const { messages } = await req.json()

  // Ask OpenAI for a streaming chat completion given the prompt
  const response = await openai.createChatCompletion({
    model: 'gpt-3.5-turbo',
    stream: true,
    messages: messages.map((message: any) => ({
      content: message.content,
      role: message.role
    }))
  })

  // Convert the response into a friendly text-stream
  const stream = OpenAIStream(response)
  // Respond with the stream
  return new StreamingTextResponse(stream)
}

Step 2: Adding a Rate Limit Custom Rule

You can use the Rate Limit API Requests Firewall Rule template to get started or follow the get started steps to add a Rate Limit custom rule to your project.

Since your AI route handler is hosted on the path /api/chat , set an If condition in your custom rule to:

Request Path "Equals": /api/chat

Your AI chat app is now ready to use and protected with rate limiting from Vercel WAF.

Simplify Security in AI Development with Rate Limiting

Securing AI applications doesn't have to be a daunting task. You have the option of implementing rate limiting with Vercel WAF. Developers can easily keep their services running smoothly and their costs in check. This guide has shown how simple it can be to set up these safeguards, whether you're starting from scratch or adding to an existing project.

With tools like Vercel WAF custom rules and ready-made templates, you can build powerful AI experiences without losing sleep over potential abuse or unexpected bills. It's all about building smarter, not harder, and with these steps, you're well on your way.

Securing your AI applications with Rate Limiting

Couldn't find the guide you need?