LLM Capacity Management
Managing AI capacity ensures your organization gets reliable, cost-effective AI performance. This guide covers data processing settings, rate limits, and strategies for high-volume usage.
Data Processing Modes
Vantage offers two modes for how AI processes data, controlled by the Process Large Datasets setting in Settings → AI Features → Query Settings.
Sampling Mode (Default)
| Property | Value |
|---|---|
| Setting | Process Large Datasets: OFF |
| Behavior | AI processes a representative sample of rows |
| Speed | Fast |
| Token cost | Lower |
| Accuracy | High for trend analysis; may miss edge cases |
| Best for | Dashboards, quick insights, most daily use |
Full Processing Mode
| Property | Value |
|---|---|
| Setting | Process Large Datasets: ON |
| Behavior | AI processes all rows in the dataset |
| Speed | Slower (proportional to data size) |
| Token cost | Higher |
| Accuracy | Complete — no data is skipped |
| Best for | Compliance audits, financial reporting, regulatory tasks |
Recommendation: Start with sampling mode. Switch to full processing only for specific use cases that require 100% data coverage.
Rate Limits
Provider-Side Rate Limits
Each AI provider enforces their own rate limits (requests per minute, tokens per minute). When Vantage hits a rate limit, it:
- Detects the
429 Rate Limitedresponse - Waits with exponential backoff
- Retries the request automatically
Common provider limits:
| Provider | Typical RPM (Requests/min) | Typical TPM (Tokens/min) |
|---|---|---|
| OpenAI | 500–10,000 (varies by tier) | 30K–300K |
| Claude | 50–4,000 | 20K–100K |
| Gemini | 60–1,000 | Varies |
| Mistral | 120–1,000 | Varies |
Note: Limits depend on your provider plan. Enterprise plans have significantly higher limits.
Impact on Workflows
High-volume workflows (processing thousands of rows through AI nodes) are the most likely to hit rate limits. Strategies:
- Add delays between workflow executions
- Filter data before AI nodes to reduce row count
- Use batch processing where available
- Upgrade your provider plan for higher rate limits
Capacity Planning for Workflows
Estimating Token Usage
For a workflow processing N rows through an AI node:
Estimated tokens = N × (avg input tokens per row + avg output tokens per row)
Example: An AI Enrichment node adding sentiment analysis to 5,000 customer feedback rows:
- Average input: ~200 tokens/row (feedback text + prompt)
- Average output: ~50 tokens/row (sentiment + topic)
- Total: 5,000 × 250 = 1,250,000 tokens
Reducing Workflow Cost
| Strategy | How | Impact |
|---|---|---|
| Pre-filter | Add a filter node before the AI node to exclude irrelevant rows | Reduces row count |
| Conditional AI | Use an AI Conditional node to only process rows that need it | Skips unnecessary AI calls |
| Batch scheduling | Run large workflows during off-peak hours | Avoids rate limits |
| Model selection | Use a budget model (GPT-4o-mini) for simple tasks | Reduces cost per token |
| Prompt optimization | Shorten prompts and remove redundant instructions | Fewer input tokens |
Monitoring Capacity
Usage Dashboard
Track real-time and historical token consumption at Settings → Account → Usage & Tokens.
Key metrics to watch:
- Daily token consumption — Is usage consistent or spiking?
- Top operations — Which features consume the most tokens?
- Per-user breakdown — Are specific users driving high consumption?
Warning Signs
| Signal | Possible Cause | Action |
|---|---|---|
| Sudden usage spike | Runaway workflow, large dataset processing | Check recent workflow runs |
| Consistently hitting rate limits | Too many concurrent AI operations | Stagger workflows, upgrade provider plan |
| High cost with low value | Over-processing simple tasks with premium models | Switch to budget models for routine tasks |
| Slow AI responses | Provider under load or dataset too large | Enable sampling, try a different provider |
Best Practices
- Start with sampling mode and only enable full processing when needed
- Use budget models (GPT-4o-mini, Mistral Small) for workflow enrichment and formatting
- Pre-filter workflows to minimize rows sent to AI nodes
- Monitor weekly using the Usage Dashboard
- Set team guidelines for when to use AI assistant vs. manual analysis
- Upgrade provider plans as usage grows rather than hitting rate limits