Understanding vector databases for AI apps

A vector database stores embeddings, which are numerical representations of data, and enables semantic search by finding vectors that are similar to a query vector. They are most often used when building Retrieval Augmented Generation (RAG) applications with AI.

Core concepts

Embeddings are dense vector representations (arrays of numbers [0.12, -0.45, 0.89, 0.33, -0.22, 0.04, 0.76, -0.11]) that capture semantic meaning. You can think of them like dimension coordinates where meaning determines position. Similar content produces similar vectors regardless of exact wording. For example, "dog" and "puppy" will have vectors closer together than "dog" and "car" because they share semantic similarity. This means embeddings can understand that questions like "What's the refund policy?" and "How do I get my money back?" are both asking about the same thing, even though they use different words.

Vector databases store these embeddings alongside your data (in your database, you would store a copy of your original content, or a chunk of it, and the embedding vector) and provide fast similarity search across millions or even billions of vectors. When you query the database, it converts your query into a vector and finds the closest matches using similarity search.

Similarity search finds the most relevant items by calculating the distance between vectors. The two most common methods are cosine similarity or euclidean distance. Cosine measures the angle between vectors, and euclidean the straight line distance between points. Smaller distances mean greater similarity.

How vector search works

Vector search transforms your content into a format that AI can understand and search semantically. Instead of matching keywords, it understands meaning, so queries like "How do I get a refund?" will surface content about return policies, even if the word "refund" never appears.

Generating embeddings for your content

First, you need to convert your text content into embeddings, which involves chunking your content by breaking up larger docs into smaller pieces (typically around 500-1000 tokens each). This is done because embedding models have token limits, and smaller chunks will yield more precise matches for the query. For example, if you are creating a RAG for your blog, you would split up a long post into sections or even paragraphs.

Next, you need to choose an embedding model. Each model will have a different number of dimensions, for example, 1536 or 3072. These numbers tell you how many numbers are in each embedding vector. So an embedding with 1536 dimensions is literally an array of 1536 decimal numbers.

Each position in the array represents a different aspect of meaning that the model has learned. Another way to think about it is, each dimension is a measure of something about the content. More dimensions mean a more nuanced understanding of the content, so higher dimensional embeddings can capture more subtle distinctions in meaning.

Popular embedding models include OpenAI's text-embedding-3-small and text-embedding-3-large. text-embedding-3-small has 1536 dimensions, and text-embedding-3-large has 3072. Depending on the amount of content you are embedding, 1536 would be cheaper, while 3072 would be more accurate.

Store vectors in the database

Once you have your embeddings, you insert them into your vector database. When inserting data, you can include metadata such as title, URL, category, published date etc. This metadata helps you filter results with your query, such as "Give me all my blog posts from 2024 about pets," where 2024 refers to the published date, and "pets" is a category tag.

Query by converting searches to vectors

When a user searches, you follow the same embedding process as above:

Take the user's query "What is the refund policy?"
Generate a query embedding by sending the question using the same embedding model you used to embed. Note that you don't actually store the user's query as an embedding.
Optionally apply any filters like metadata, for example, "only search through the privacy section" or "only show results from the past six months"

Retrieve results

When retrieving results, the database does the similarity calculation by comparing the query vector against all the stored vectors using cosine similarity or euclidean distance. In your code, you would set k to equal the number of results you want to retrieve. If you set k=5 you are asking the database to return the 5 most similar vectors to your query.

The following example shows how setting up and querying a Supabase vector database might work. Note that running this code directly won't work before you set up your Supabase vector database. See the Supabase documentation to learn how.

import { embed, embedMany } from 'ai';
import { openai } from '@ai-sdk/openai';
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_KEY!
);

// Generate and store embeddings
async function storeDocuments(documents: Array<{ text: string; metadata: any }>) {
  const { embeddings } = await embedMany({
    model: openai.embedding('text-embedding-3-small'),
    values: documents.map(doc => doc.text),
  });

  const records = documents.map((doc, i) => ({
    content: doc.text,
    embedding: embeddings[i],
    metadata: doc.metadata,
  }));

  const { data, error } = await supabase
    .from('documents')
    .insert(records);

  return { data, error };
}


async function search(query: string, topK: number = 5) {
  // Generate embedding for query
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: query,
  });


  const { data, error } = await supabase.rpc('match_documents', {
    query_embedding: embedding,
    match_threshold: 0.7,
    match_count: topK,
  });

  return data;
}

AI Cloud

Core Platform

Security

Company

Open Source

Tools

Use Cases