Generate realistic datasets for demos, learning, and dashboards. Instantly preview data, export as CSV or SQL, and explore with Metabase.
Features:
- Conversational prompt builder: choose business type, schema, row count, and more
- Real-time data preview in the browser
- Export as CSV (single file or multi-table ZIP) or as SQL inserts
- One-click Metabase launch for data exploration (see Using Metabase for details)
- Select your business type, schema, and other parameters.
- Click "Preview Data" to generate a 10-row sample (incurs a small LLM cost, depending on provider).
- Download CSV/SQL for as many rows as you want—no extra cost, always uses the same schema/columns as the preview.
- Docker (includes Docker Compose)
- At least one API key for a supported LLM provider (OpenAI, Anthropic, or Google GenAI)
-
Clone the repo:
git clone <your-repo-url> cd dataset-generator
-
Create your .env file:
Copy the example file and fill in your LLM provider API keys (OpenAI, Anthropic, Google, etc.):
cp .env.example .env
Then edit
.env
and add your API keys as needed:# For local development, you can use any value for these keys: LITELLM_MASTER_KEY=sk-1234 LITELLM_SALT_KEY=sk-1234 # Add at least one provider key below: OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=... GOOGLE_GENAI_API_KEY=... # Set LLM_MODEL to match your provider: LLM_MODEL=gpt-4o # Examples values: # For OpenAI: LLM_MODEL=gpt-4o # For Anthropic: LLM_MODEL=claude-4-sonnet # For Google: LLM_MODEL=gemini-2.5-flash
-
Start the Next.js app:
npm install npm run dev
- The app runs at http://localhost:3000
-
Start LiteLLM (Required for LLM Features):
This app uses LiteLLM as a gateway for all LLM requests (OpenAI, Anthropic, Google, etc.).
You must start LiteLLM for dataset generation and preview features to work.
From your project root, run:
docker compose up litellm db_litellm
- This starts the LiteLLM gateway and its dedicated Postgres database.
- LiteLLM will listen on
http://localhost:4000
by default.
-
Generate a dataset:
- Use the prompt builder to define your dataset.
- Click "Preview Data" to see a sample.
-
Export or Explore:
- Download your dataset as CSV or SQL Inserts.
- Click "Start Metabase" to spin up Metabase in Docker.
- Once Metabase is ready, click "Open Metabase" to explore your data.
- When done, click "Stop Metabase" to shut down and clean up Docker containers.
- When you preview a dataset, the app uses LiteLLM (which can route to OpenAI, Anthropic, Google, etc.) to generate a detailed data spec (schema, business rules, event logic) for your chosen business type and parameters.
- All actual data rows are generated locally using Faker, based on the LLM-generated spec.
- Downloading or exporting data never calls an LLM again—it's instant and free.
Action | Calls LLM? | Cost? | Uses LLM? | Uses Faker? | Row Count |
---|---|---|---|---|---|
Preview | Yes | ~$0.05 | Yes | Yes | 10 |
Download CSV | No | $0 | No | Yes | 100+ |
Download SQL | No | $0 | No | Yes | 100+ |
The above costs and behavior are based on testing with the OpenAI GPT-4o model. Costs and token usage may vary with other providers/models.
- You only pay for the preview/spec generation (e.g., ~$0.05 per preview with OpenAI GPT-4o)
- All downloads use the same columns/spec, just with more rows, and are free
When you click "Start Metabase", it will launch Metabase in a Docker container. Once ready:
- Click "Open Metabase" to access the Metabase interface
- Follow Metabase's setup process
- To analyze your generated data:
- Use the CSV export feature to download your dataset
- In Metabase, use the "Upload Data" feature to analyze your CSV files
- Or connect to your own database where you've loaded the data
/app/page.tsx
– Main UI and prompt builder/app/api/generate/route.ts
– Synthetic data generator (via LiteLLM: OpenAI, Anthropic, Google, etc.)/app/api/metabase/start|stop|status/route.ts
– Docker orchestration for Metabase/lib/export/
– CSV/SQL export logic/docker-compose.yml
– Used for Metabase and LiteLLM services
- Next.js (App Router, TypeScript)
- Tailwind CSS + ShadCN UI (modern, dark-themed UI)
- LiteLLM (multi-provider LLM gateway: OpenAI, Anthropic, Google, etc.)
- Metabase (Dockerized, launched on demand)
- To add new business types, edit
lib/spec-prompts.ts
and add entries to thebusinessTypeInstructions
object