How to use TanStack Query for server state, SSR, and streaming

Fetching server data in a React app usually starts with a useState for the result, another for loading, a third for errors, and a useEffect to wire them together. That holds up until a second component needs the same record, or the data changes after it loads. At that point, you are hand-rolling a cache: deduping requests, tracking staleness, retrying failures, and refetching after writes.

TanStack Query standardizes that work by giving your app a consistent way to manage server state. It is an async state library built for data that lives on a server and keeps changing after the first fetch. You write the async function that fetches or updates data (the promise-returning function), and TanStack Query handles everything around it: it gives that data a stable cache identity, decides when it’s fresh or stale, retries failures, refetches in the background, supports optimistic updates, and then reconciles the cache after mutations so the UI stays in sync.

Once a product surface spans multiple frameworks, you need a shared way to manage server state. TanStack Query supports React, Preact, Vue, Solid, Svelte, Lit and Angular through dedicated framework adapters, so the mental model can stay stable while the component syntax changes.

As applications evolve from simple CRUD to streaming AI agents, async state management gets harder, which is the problem TanStack Query is built to solve.

Copy link to headingServer state needs a contract

Every query is identified by a queryKey, and that key is what makes invalidation predictable across views. The query declares which resource the UI wants, which variables identify it, how fresh the cached answer should be, and what should happen when a write makes that answer suspect.

Think of each query as a contract for a piece of server state, not a single request. A request ends when fetch resolves, but server state keeps changing after the response arrives. Other users edit records, background jobs finish, agents append messages, and webhooks update status. TanStack Query lets the client retain a useful answer while still treating freshness as a managed property.

TanStack Query manages that contract with a stale-while-revalidate approach, where it shows the last cached value right away, treats it as potentially out of date, and refetches in the background when something meaningful changes, like the user navigating, the UI regaining focus, or a mutation updating the underlying data.

Copy link to headingWhat TanStack Query manages

When you adopt TanStack Query, start by drawing a clear line between server state and local UI state. TanStack Query is for remote, asynchronous data: data fetched from an API, database, or backend service that needs caching, refetching, synchronization, loading states, and error handling.

Local UI state should usually stay in React state or a client-state store. That includes text input, modal visibility, selected tabs, hover state, temporary form drafts, and other state that exists only within the current user session.

Use this split between server state and local UI state:

Queries read async data and cache the result under a queryKey.
Mutations write data and give you success, pending, error, retry, and rollback hooks.
Query keys identify resources with serializable arrays, such as ['thread', threadId] or ['projects', { cursor }].
Invalidation marks cached data stale after a write, then refetches active queries in the background.
Freshness settings like staleTime prevent a hydrated page from refetching immediately when the server already rendered useful data.

Stack	Adapter	Common primitive
React and Next.js	`@tanstack/react-query`	`useQuery`
Preact	`@tanstack/preact-query`	`useQuery`
Vue and Nuxt	`@tanstack/vue-query`	`useQuery`
Svelte and SvelteKit	`@tanstack/svelte-query`	`createQuery`
Solid	`@tanstack/solid-query`	`useQuery`
Lit (experimental)	`@tanstack/lit-query`	`createQueryController`
Angular	`@tanstack/angular-query-experimental`	`injectQuery`

The fetcher can call REST, GraphQL, tRPC, a server action endpoint, or any promise-returning function.

Copy link to headingCore primitives to learn first

A reusable query setup starts with typed key factories. They keep invalidation precise because every list, detail view, and child resource shares a predictable namespace.

export const threadKeys = {
  all: ['threads'] as const,
  list: (workspaceId: string) =>
    [...threadKeys.all, { workspaceId }] as const,
  detail: (threadId: string) =>
    [...threadKeys.all, 'detail', threadId] as const,
  messages: (threadId: string) =>
    [...threadKeys.detail(threadId), 'messages'] as const,
  liveMessages: (threadId: string) =>
    [...threadKeys.detail(threadId), 'live'] as const,
}

The basic read path then becomes small enough to repeat across adapters.

import { useQuery } from '@tanstack/react-query'
export function useThread(threadId: string) {
  return useQuery({
    queryKey: threadKeys.detail(threadId),
    queryFn: () => fetch(`/api/threads/${threadId}`).then((res) => res.json()),
    staleTime: 30_000,
  })
}

The write path decides how much confidence the UI should show before the server confirms the result. A low-risk rename can update the cache optimistically and roll back on error. A payment action should wait for the server.

import { useMutation, useQueryClient } from '@tanstack/react-query'
export function useRenameThread(threadId: string) {
  const queryClient = useQueryClient()
  return useMutation({
    mutationFn: async (title: string) => {
      const res = await fetch(`/api/threads/${threadId}`, {
        method: 'PATCH',
        body: JSON.stringify({ title }),
      })
      if (!res.ok) throw new Error('Rename failed')
      return res.json()
    },
    onMutate: async (title) => {
      await queryClient.cancelQueries({
        queryKey: threadKeys.detail(threadId),
      })
      const previous = queryClient.getQueryData(threadKeys.detail(threadId))
      queryClient.setQueryData(threadKeys.detail(threadId), (thread: any) =>
        thread ? { ...thread, title } : thread,
      )
      return { previous }
    },
    onError: (_error, _title, context) => {
      queryClient.setQueryData(threadKeys.detail(threadId), context?.previous)
    },
    onSettled: () =>
      queryClient.invalidateQueries({
        queryKey: threadKeys.detail(threadId),
      }),
  })
}

Copy link to headingSSR begins on the server

Server-Side Rendering (SSR) is the process of fetching data and generating fully populated HTML on the server, rather than sending a blank shell and forcing the user's browser to build the UI from scratch. Four steps move the cache across the network boundary: create the query client for the request, prefetch the data needed for the first paint, dehydrate the cache, and hydrate it on the client. TanStack's server rendering and hydration docs also call out the serialization boundary for custom SSR setups.

The reason to do this is practical. The server renders useful HTML, the browser receives the same query data, and the client avoids an immediate duplicate fetch. A nonzero staleTime usually belongs in SSR setups so the hydrated data remains fresh long enough for the page to become interactive.

import { QueryClient } from '@tanstack/react-query'
export function makeQueryClient() {
  return new QueryClient({
    defaultOptions: {
      queries: {
        staleTime: 60 * 1000,
      },
    },
  })
}

The data source does not need to know which framework rendered the page. The adapter decides how the prefetched cache crosses the server-client boundary.

Copy link to headingNext.js App Router pattern

Server Components own the initial data fetch, and Client Components own interactivity. Prefetch in the Server Component and wrap the Client Component with HydrationBoundary to pass the cache between them; background refetching and user-triggered mutations then run on the client.

app/threads/[threadId]/page.tsx

import {
  dehydrate,
  HydrationBoundary,
  QueryClient,
} from '@tanstack/react-query'
import { ThreadView } from './thread-view'
export default async function Page({
  params,
}: {
  params: Promise<{ threadId: string }>
}) {
  const { threadId } = await params
  const queryClient = new QueryClient()
  await queryClient.prefetchQuery({
    queryKey: threadKeys.detail(threadId),
    queryFn: () => getThread(threadId),
  })
  return (
    <HydrationBoundary state={dehydrate(queryClient)}>
      <ThreadView threadId={threadId} />
    </HydrationBoundary>
  )
}

The Client Component then reads the same query from the hydrated cache, so its first render uses the prefetched data instead of starting a new fetch.

app/threads/[threadId]/thread-view.tsx

'use client'
import { useQuery } from '@tanstack/react-query'
export function ThreadView({ threadId }: { threadId: string }) {
  const { data: thread, isPending } = useQuery({
    queryKey: threadKeys.detail(threadId),
    queryFn: () => fetch(`/api/threads/${threadId}`).then((res) => res.json()),
  })
  if (isPending) return <p>Loading...</p>
  if (!thread) return null
  return <h1>{thread.title}</h1>
}

This leaves React Server Components and TanStack Query with different jobs: RSC streams HTML, while TanStack Query keeps the hydrated client coherent after interaction starts.

Copy link to headingNuxt, SvelteKit, Astro, and Remix

The same model works across the frameworks teams already ship on Vercel, but each adapter crosses the SSR boundary in its own idiom.

Copy link to headingNuxt uses a plugin boundary

Nuxt 3 apps can create a Vue Query client in a plugin, dehydrate it on the server, and hydrate it from Nuxt state in the browser.

plugins/vue-query.ts

import {
  VueQueryPlugin,
  QueryClient,
  dehydrate,
  hydrate,
  type DehydratedState,
} from '@tanstack/vue-query'
export default defineNuxtPlugin((nuxt) => {
  const queryClient = new QueryClient()
  const state = useState<DehydratedState | null>('vue-query', () => null)
  nuxt.vueApp.use(VueQueryPlugin, { queryClient })
  if (import.meta.server) {
    nuxt.hooks.hook('app:rendered', () => {
      state.value = dehydrate(queryClient)
    })
  }
  if (import.meta.client) {
    hydrate(queryClient, state.value)
  }
})

Copy link to headingSvelteKit prefetches in load

SvelteKit's pattern starts in a layout that creates the query client. Use the SvelteKit browser module to prevent normal queries from running during SSR, while still allowing explicit prefetching.

src/routes/+layout.ts

import { browser } from '$app/environment'
import { QueryClient } from '@tanstack/svelte-query'
export async function load() {
  const queryClient = new QueryClient({
    defaultOptions: {
      queries: {
        enabled: browser,
      },
    },
  })
  return { queryClient }
}

The layout component then receives that client from load and wraps the route tree in QueryClientProvider so every child can use it.

src/routes/+layout.svelte

<script lang="ts">
  import { QueryClientProvider } from '@tanstack/svelte-query'
  import type { LayoutData } from './$types'
  export let data: LayoutData
</script>
<QueryClientProvider client={data.queryClient}>
  <slot />
</QueryClientProvider>

Then a page load can prefetch with the framework-provided fetch, and createQuery can read from the populated cache.

src/routes/+page.ts

export async function load({ parent, fetch }) {
  const { queryClient } = await parent()
  await queryClient.prefetchQuery({
    queryKey: ['posts'],
    queryFn: async () => (await fetch('/api/posts')).json(),
  })
}

The page component then reads that same query with createQuery, pulling straight from the cache the load function already filled.

src/routes/+page.svelte

<script lang="ts">
  import { createQuery } from '@tanstack/svelte-query'
  const posts = createQuery(() => ({
    queryKey: ['posts'],
    queryFn: async () => (await fetch('/api/posts')).json(),
  }))
</script>
{#if posts.data}
  {#each posts.data as post}
    <article>{post.title}</article>
  {/each}
{/if}

The result is a page that ships with server-prefetched data on first paint, then hands reactivity to createQuery once the user starts interacting.

Copy link to headingAstro passes initial data to islands

Astro pages often render static or server-loaded HTML, then hydrate interactive islands. For a TanStack Query island, pass server-loaded data as initialData so the first client render starts warm.

---
import ThreadIsland from '../components/thread-island.tsx'
const thread = await fetch(`${Astro.url.origin}/api/thread`).then((res) =>
  res.json(),
)
---
<ThreadIsland client:load initialThread={thread} />

The island component then takes that prop as initialData, so its first client render starts with server-loaded data rather than an empty cache.

import { useQuery } from '@tanstack/react-query'
export default function ThreadIsland({ initialThread }: any) {
  const { data } = useQuery({
    queryKey: threadKeys.detail(initialThread.id),
    queryFn: () =>
      fetch(`/api/threads/${initialThread.id}`).then((res) => res.json()),
    initialData: initialThread,
  })
  return <h2>{data.title}</h2>
}

The result is a mostly static Astro page with one warm, interactive island, hydrated with data the server has already fetched.

Copy link to headingRemix puts hydration in loaders

Remix loaders map cleanly to TanStack Query prefetching. The loader prefetches and returns the dehydrated cache; the route renders a HydrationBoundary. This pattern applies to Remix v2. In React Router v7, the Remix successor, loaders return plain objects, and imports come from react-router.

app/routes/threads.$threadId.tsx

import { json } from '@remix-run/node'
import { useLoaderData } from '@remix-run/react'
import {
  dehydrate,
  HydrationBoundary,
  QueryClient,
  useQuery,
} from '@tanstack/react-query'
export async function loader({ params }: any) {
  const queryClient = new QueryClient()
  await queryClient.prefetchQuery({
    queryKey: threadKeys.detail(params.threadId),
    queryFn: () => getThread(params.threadId),
  })
  return json({ dehydratedState: dehydrate(queryClient) })
}
export default function Route() {
  const { dehydratedState } = useLoaderData<typeof loader>()
  return (
    <HydrationBoundary state={dehydratedState}>
      <Thread />
    </HydrationBoundary>
  )
}

Solid and Angular follow the same division with different primitives. Solid Query uses Solid's reactive model, while Angular Query exposes injectQuery, so the cache contract stays recognizable even when the component syntax changes.

Copy link to headingOptimistic updates need rollback paths

A chat message, a checkbox flip, a reorder, or a title edit can update the cache before the server confirms. A destructive admin action usually must wait for the server. The dividing line is the action's success rate and how much the user notices the wait.

The safest optimistic mutation does four things:

Cancels in-flight reads for the resource being changed.
Saves the previous cached value.
Writes the optimistic value with a temporary ID or pending status.
Rolls back on error and invalidates on settle.

That final invalidation is necessary because optimistic data is a guess. The server may add fields, normalize content, reject a tool call, or reorder a list after persistence, so the cache needs to reconcile against whatever the server actually returned.

Copy link to headingInfinite lists need stable cursors

Infinite queries are normal queries with a page shape. TanStack Query stores pages and pageParams, then gives the UI fetchNextPage, hasNextPage, and separate pending state for loading another page.

import { useInfiniteQuery } from '@tanstack/react-query'
export function useThreadMessages(threadId: string) {
  return useInfiniteQuery({
    queryKey: threadKeys.messages(threadId),
    queryFn: ({ pageParam }) =>
      fetch(`/api/threads/${threadId}/messages?cursor=${pageParam}`).then(
        (res) => res.json(),
      ),
    initialPageParam: 'latest',
    getNextPageParam: (lastPage) => lastPage.nextCursor,
  })
}

When you paginate a long list, the backend often uses a cursor (a pointer like “start after message 123”) to fetch the next page. If new items are inserted while someone is paging, offsets can shift, causing the same row to appear twice or a row to be skipped. Stable cursors prevent duplicates and skips when new items arrive while someone is paging. Chat threads are a useful example: new messages keep arriving while a user is paging back through older messages. Keep the live, newly arriving messages under a separate key, such as threadKeys.liveMessages(threadId), and invalidate the paginated query when persistence changes the underlying order.

Copy link to headingAgent threads are server state

AI chat makes TanStack Query's server-state model more obvious. A thread possesses every hallmark of complex server state: it is hosted remotely, shared across multiple actors, inherently asynchronous, and highly mutable. The user can submit a message, the assistant can stream a response, tools can run, and another client can load the same thread history.

A thread really has two halves. One is the live interaction as tokens stream in, the other is the durable record that other clients load later. Treat the thread as two kinds of state, and use different tooling for each.

For the live streaming experience, use the AI SDK. Its useChat hook manages the in-flight interaction, including streaming tokens and input handling.

For the durable record of the thread, use TanStack Query. It owns the cached thread history that other views and clients load later.

When you need provider choice and fallbacks behind a single production endpoint, route model calls through the AI Gateway.

When the agent needs to operate inside Slack, Teams, Discord, Google Chat, Linear, or other work tools, Chat SDK can expose those operations as AI SDK tools.

Here is how those two tools come together within a single client component. useChat drives the live stream, while a TanStack Query mutation writes the user's message into the live-message cache optimistically and invalidates the durable thread queries once the response settles.

'use client'
import { useChat } from '@ai-sdk/react'
import { DefaultChatTransport } from 'ai'
import { useMutation, useQueryClient } from '@tanstack/react-query'
export function AgentThread({ threadId }: { threadId: string }) {
  const queryClient = useQueryClient()
  const chat = useChat({
    transport: new DefaultChatTransport({
      api: `/api/threads/${threadId}/chat`,
    }),
    onFinish: () => {
      queryClient.invalidateQueries({
        queryKey: threadKeys.detail(threadId),
      })
    },
    onError: () => {
      queryClient.invalidateQueries({
        queryKey: threadKeys.messages(threadId),
      })
    },
  })
  const sendMessage = useMutation({
    mutationFn: ({ id, text }: { id: string; text: string }) =>
      chat.sendMessage({ text, messageId: id }),
    onMutate: async ({ id, text }) => {
      await queryClient.cancelQueries({
        queryKey: threadKeys.liveMessages(threadId),
      })
      const previous = queryClient.getQueryData(
        threadKeys.liveMessages(threadId),
      )
      queryClient.setQueryData(
        threadKeys.liveMessages(threadId),
        (
          messages: Array<{
            id: string
            role: 'user'
            content: string
            status: 'pending'
          }> = [],
        ) => [
          ...messages,
          { id, role: 'user', content: text, status: 'pending' },
        ],
      )
      return { previous }
    },
    onError: (_error, _draft, context) => {
      queryClient.setQueryData(
        threadKeys.liveMessages(threadId),
        context?.previous,
      )
    },
    onSettled: () => {
      queryClient.invalidateQueries({
        queryKey: threadKeys.messages(threadId),
      })
    },
  })
  return (
    <form
      onSubmit={(event) => {
        event.preventDefault()
        const form = event.currentTarget
        const text = new FormData(form).get('text') as string
        sendMessage.mutate({
          id: crypto.randomUUID(),
          text,
        })
        form.reset()
      }}
    >
      {chat.messages.map((message) => (
        <Message key={message.id} message={message} status={chat.status} />
      ))}
      <input name="text" />
    </form>
  )
}

The client cache only needs to know which thread contract changed, regardless of which provider produced the response.

Copy link to headingStreaming tokens belong in cache

If your chat UI has ever shown a duplicated message, a spinner attached to the wrong reply, or streaming text that disappears on refresh, you’re seeing the same underlying problem. Your thread state is split across multiple stores. Keep three pieces of state together: persisted messages, the currently streaming assistant message, and tool-call status attached to the message that triggered it. The AI SDK message parts array gives the UI text parts and typed tool parts, while metadata can carry timestamps, model IDs, and token usage.

When streaming chunks arrive outside useChat, merge them into a live-message query key instead of creating a parallel store. Keep that key separate from an infinite-query key because infinite queries store { pages, pageParams }, not a flat message array.

Here is that merge as a reusable helper. appendToken finds the streaming message by id and appends each new token delta to its cached content.

type ToolStatus = 'pending' | 'running' | 'completed' | 'errored'
type ThreadMessage = {
  id: string
  role: 'user' | 'assistant'
  content: string
  toolCalls?: Record<string, { name: string; status: ToolStatus }>
}
export function appendToken(
  queryClient: QueryClient,
  threadId: string,
  messageId: string,
  delta: string,
) {
  queryClient.setQueryData(
    threadKeys.liveMessages(threadId),
    (messages: ThreadMessage[] = []) =>
      messages.map((message) =>
        message.id === messageId
          ? { ...message, content: message.content + delta }
          : message,
      ),
  )
}

Tool status uses the same approach. Store the tool call beside the assistant message, and update it from pending to running when execution starts, then to completed or errored when the result arrives.

export function setToolStatus(
  queryClient: QueryClient,
  threadId: string,
  messageId: string,
  toolCallId: string,
  status: ToolStatus,
) {
  queryClient.setQueryData(
    threadKeys.liveMessages(threadId),
    (messages: ThreadMessage[] = []) =>
      messages.map((message) =>
        message.id === messageId
          ? {
              ...message,
              toolCalls: {
                ...message.toolCalls,
                [toolCallId]: {
                  name: message.toolCalls?.[toolCallId]?.name ?? 'tool',
                  status,
                },
              },
            }
          : message,
      ),
  )
}

On the server, Chat SDK tools can be passed directly into AI SDK generation. Chat SDK requires a platform adapter and a state adapter. Use Redis, Postgres, or another production state adapter so subscriptions and locks survive across function instances. Use Vercel Workflows when a tool needs durable execution or should continue outside the chat response lifecycle.

Start with the shared Chat instance on the server. It registers a platform adapter (Slack) and a Redis state adapter, so locks and subscriptions survive across function instances.

lib/chat.ts

// lib/chat.ts
import { Chat } from 'chat'
import { createSlackAdapter } from '@chat-adapter/slack'
import { createRedisState } from '@chat-adapter/state-redis'
export const workspaceChat = new Chat({
  userName: 'agent',
  adapters: {
    slack: createSlackAdapter(),
  },
  state: createRedisState(),
})

Then hand that instance to the route that streams the response. createChatTools exposes the configured adapters as AI SDK tools, and maxDuration = 800 gives the model time to stream tokens and wait on tool calls.

app/api/threads/[threadId]/chat/route.ts

// app/api/threads/[threadId]/chat/route.ts
import { convertToModelMessages, streamText } from 'ai'
import { createChatTools } from 'chat/ai'
import { workspaceChat } from '@/lib/chat'
export const maxDuration = 800
export async function POST(req: Request) {
  const { messages } = await req.json()
  const result = streamText({
    model: process.env.AI_MODEL!,
    messages: await convertToModelMessages(messages),
    tools: createChatTools({
      chat: workspaceChat,
      preset: ['reader', 'messenger'],
    }),
  })
  return result.toUIMessageStreamResponse({
    originalMessages: messages,
  })
}

Optimistic user messages appear immediately; streaming assistant tokens update the assistant message in place; tool parts update their status in place; and the final persisted response invalidates the thread.

Copy link to headingFluid compute for long-lived, I/O-heavy AI requests

The backend half of this pattern is a long-lived function that spends much of its time waiting while the model streams tokens, tools call external APIs, and the client keeps the connection open. That is the workload Fluid compute is built for.

Fluid compute can run functions up to 800 seconds on Pro and Enterprise plans, and its pricing model separates active CPU from I/O wait time. Active CPU billing applies while code is executing and pauses when the function waits on external services, including model calls, with memory remaining provisioned for in-flight work.

On the client, the UI has to stay coherent while the thread moves from pending state to streamed output to final persisted data. TanStack Query owns that contract. On the server, Fluid Compute is a good fit for the long-lived requests that power agent experiences, such as token streaming, tool calls, and external I/O.

If you are building this pattern on Vercel, start by pairing TanStack Query for durable thread state with Fluid Compute for the chat route that needs to stay open.

Copy link to headingTanStack Query FAQ

Copy link to headingDoes TanStack Query replace Redux or Zustand?

TanStack Query replaces the server-state part of many Redux or Zustand setups. Client-only state, such as drafts, selected UI controls, canvas state, and modal visibility, still belongs in local state or a client-state store.

Copy link to headingShould every framework use the same query keys?

Shared query keys are useful when multiple frontends talk to the same API. A React app, a SvelteKit app, and a Nuxt app can all use the same resource naming convention, even though each adapter exposes different hooks or reactive primitives.

Copy link to headingShould server-rendered pages always hydrate TanStack Query?

Hydration is worth it when the client will keep interacting with the same data after the first paint. A static documentation page can use server rendering alone. A dashboard, feed, editor, or agent thread usually benefits from cache hydration.

Copy link to headingShould chat tokens live in TanStack Query?

If they represent a durable state, yes. Streaming tokens should update the cached thread because the user will continue to see and interact with that content after the stream finishes. Ephemeral input state can stay inside the chat component, but the thread query should own persisted messages, pending assistant responses, and tool-call status.

Agent Stack

Core Platform

Tools

Learn

Build

Explore