
OSS Data Analyst
An AI data analyst agent that explores a semantic layer in a sandbox environment to answer natural language questions with SQL.
Overview
OSS Data Analyst uses a sandboxed exploration approach: instead of hardcoding schema knowledge into prompts, the agent is given shell access to a sandbox containing your semantic layer files. It discovers the schema dynamically using cat, grep, and ls commands, then builds and executes SQL queries based on what it finds.
This architecture means the agent can:
- Adapt to any schema without prompt changes
- Explore relationships between entities naturally
- Handle schema updates without redeployment
- Reason about data the same way a human analyst would
How It Works
- Sandbox Creation - A Vercel Sandbox is spun up and populated with your semantic layer YAML files
- Schema Exploration - The agent uses shell commands to browse the catalog and entity definitions
- Query Building - Based on discovered schema, the agent constructs SQL queries
- Execution - Queries run against your SQLite database
- Reporting - Results are formatted with a narrative explanation
User Question↓┌─────────────────────────────────────┐│ Vercel Sandbox ││ ┌─────────────────────────────┐ ││ │ semantic/ │ ││ │ ├── catalog.yml │ ││ │ └── entities/ │ ││ │ ├── companies.yml │ ││ │ ├── people.yml │ ││ │ └── accounts.yml │ ││ └─────────────────────────────┘ ││ ││ Agent explores with: ││ • cat semantic/catalog.yml ││ • grep -r "keyword" semantic/ ││ • cat semantic/entities/*.yml │└─────────────────────────────────────┘↓SQL Query → Database → Results → Narrative
Quick Start
Prerequisites
- Node.js 20+
- pnpm
- Vercel AI Gateway API key
Installation
git clone https://github.com/vercel-labs/oss-data-analyst.gitcd oss-data-analystpnpm install
Configuration
cp env.local.example .env.local
Add your Vercel AI Gateway key to .env.local.
Initialize Database
pnpm initDatabase
Creates a SQLite database with sample data (Companies, People, Accounts).
Run
pnpm dev
Semantic Layer
The semantic layer lives in src/semantic/ and defines your data model:
src/semantic/├── catalog.yml # Entity index with descriptions└── entities/├── companies.yml # Company entity definition├── people.yml # People entity definition└── accounts.yml # Accounts entity definition
Each entity YAML includes:
sql_table_name- The underlying tablefields- Available columns with SQL expressionsjoins- Relationships to other entities- Example questions the entity can answer
The agent reads these files at runtime to understand your schema.
Example Questions
- "How many companies are in the Technology industry?"
- "What is the average salary by department?"
- "Show me the top 5 accounts by monthly value"
- "Which companies have the most employees?"
Architecture
Stack: Next.js, Vercel AI SDK, Vercel Sandbox, SQLite
Key Files:
src/lib/agent.ts- Agent definition and system promptsrc/lib/tools/sandbox.ts- Sandbox creation with semantic filessrc/lib/tools/shell.ts- Shell command tool for explorationsrc/lib/tools/execute-sqlite.ts- SQL execution tool
Adding Your Own Schema
- Add entity YAML files to
src/semantic/entities/ - Update
src/semantic/catalog.ymlwith the new entity - The agent will automatically discover and use the new schema
No code changes required—the sandbox approach means schema changes are picked up at runtime.
Troubleshooting
Database Not Found
pnpm initDatabase
Build Errors
pnpm type-check