How to Use ML Models from Hugging Face in Vercel Functions

Hugging Face provides a wide range of machine learning models that can be easily integrated into your applications. In this guide, we will walk you through the process of using ML models from Hugging Face using Vercel AI SDK, which provides a set of utilities to make it easy to use Hugging Face's APIs.

Prerequisites

Before you begin, make sure you have the following:

A Hugging Face API key
Vercel account

Step 1: Create a Next.js app

Create a Next.js application and install ai and @huggingface/inference:

pnpm dlx create-next-app my-ai-app
cd my-ai-app
pnpm install ai @huggingface/inference

Note: You can use npm or yarn if you would like

Step 2: Add your Hugging Face API Key to `.env`

HUGGINGFACE_API_KEY=xxxxxxxxx

Step 3: Accessing Hugging Face Models

Go to the Hugging Face website at huggingface.co.
Click on the "Models" tab in the navigation bar.

Step 4: Selecting a Model

On the left-hand side of the models page, you will see a list of task types. Choose the task type that corresponds to your use case. For example, if you want to perform text generation, click on the "Text Generation" option.
Browse through the available models and select the one that best suits your needs.
Or skip the above steps and use https://sdk.vercel.ai/ to select a hugging face model

Step 5: Creating the Next.js Route Handler

To use the selected ML model, you need to create a Route Handler.

Create a new file named app/api/completion/route.ts in your project.
Add your code to the route handler. It might look something like the code below. In this example, the route handler accepts a POST request with a prompt string. It then generates a text completion using the OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 model. The response is then streamed back to our page.

import { HfInference } from '@huggingface/inference';
import { HuggingFaceStream, StreamingTextResponse } from 'ai';

// Create a new Hugging Face Inference instance
const Hf = new HfInference(process.env.HUGGINGFACE_API_KEY);

export async function POST(req: Request) {
	// Extract the `prompt` from the body of the request
	const { prompt } = await req.json();

	const response = await Hf.textGenerationStream({
		model: 'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5',
		inputs: `<|prompter|>${prompt}<|endoftext|><|assistant|>`,
		parameters: {
			max_new_tokens: 200,
			// @ts-ignore (this is a valid parameter specifically in OpenAssistant models)
			typical_p: 0.2,
			repetition_penalty: 1,
			truncate: 1000,
			return_full_text: false
		}
	});

	// Convert the response into a friendly text-stream
	const stream = HuggingFaceStream(response);

	// Respond with the stream
	return new StreamingTextResponse(stream);
}

Note: Replace OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 in the code with the model name you wish to use.

The Vercel AI SDK provides two utility helpers that streamline the integration process:

HuggingFaceStream: This helper takes the streaming response received from Hf.textGenerationStream, decodes and extracts the text tokens, and re-encodes them for easy consumption.
StreamingTextResponse: This helper extends the Web Response class and provides default headers, including the desired Content-Type': 'text/plain; charset=utf-8.

By utilizing these helpers, you can pass the transformed stream directly to StreamingTextResponse, enabling the client to consume the response effortlessly.

Step 6: Fetching Data from the API Route

Now that you have set up the API route, you can fetch data from it in your components by creating a form with an input for the prompt.

To make this process easier, use the useCompletion hook, which defaults to the POST Route Handler we created earlier. If you want to override this default behavior, simply pass a custom 'api' prop to useCompletion({ api: '...'}).

Open the file where you want to use the ML model, and create the form with necessary inputs, specifying the server action as the form action. Your code could look like this:

'use client';

import { useCompletion } from 'ai/react';

export default function Completion() {
	const { completion, input, stop, isLoading, handleInputChange, handleSubmit } = useCompletion({
		api: '/api/completion'
	});

	return (
		<div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">
			<form onSubmit={handleSubmit}>
				<label>
					Say something...
					<input
						className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"
						value={input}
						onChange={handleInputChange}
					/>
				</label>
				<output>Completion result: {completion}</output>
				<button type="button" onClick={stop}>
					Stop
				</button>
				<button disabled={isLoading} type="submit">
					Send
				</button>
			</form>
		</div>
	);
}

Step 7: Deploying to Vercel

Finally, we’ll be deploying the repo to Vercel.

First, create a new GitHub repository and push your local changes.
Deploy it to Vercel. Ensure you add all environment variables that you configured earlier to Vercel during the import process.

After successful deployment, your ML model from Hugging Face will be running at the edge, providing faster response times and lower latency.

Congratulations! You have successfully integrated an ML model from Hugging Face with Vercel Functions. Users can now interact with your application and get answers to their questions in real time.

How to Use ML Models from Hugging Face in Vercel Functions

Couldn't find the guide you need?