4 min read

LLM Capacity Management

Managing AI capacity ensures your organization gets reliable, cost-effective AI performance. This guide covers data processing settings, rate limits, and strategies for high-volume usage.

Data Processing Modes

Vantage offers two modes for how AI processes data, controlled by the Process Large Datasets setting in Settings → AI Features → Query Settings.

Sampling Mode (Default)

Property	Value
Setting	Process Large Datasets: OFF
Behavior	AI processes a representative sample of rows
Speed	Fast
Token cost	Lower
Accuracy	High for trend analysis; may miss edge cases
Best for	Dashboards, quick insights, most daily use

Full Processing Mode

Property	Value
Setting	Process Large Datasets: ON
Behavior	AI processes all rows in the dataset
Speed	Slower (proportional to data size)
Token cost	Higher
Accuracy	Complete — no data is skipped
Best for	Compliance audits, financial reporting, regulatory tasks

Recommendation: Start with sampling mode. Switch to full processing only for specific use cases that require 100% data coverage.

Rate Limits

Provider-Side Rate Limits

Each AI provider enforces their own rate limits (requests per minute, tokens per minute). When Vantage hits a rate limit, it:

Detects the 429 Rate Limited response
Waits with exponential backoff
Retries the request automatically

Common provider limits:

Provider	Typical RPM (Requests/min)	Typical TPM (Tokens/min)
OpenAI	500–10,000 (varies by tier)	30K–300K
Claude	50–4,000	20K–100K
Gemini	60–1,000	Varies
Mistral	120–1,000	Varies

Note: Limits depend on your provider plan. Enterprise plans have significantly higher limits.

Impact on Workflows

High-volume workflows (processing thousands of rows through AI nodes) are the most likely to hit rate limits. Strategies:

Add delays between workflow executions
Filter data before AI nodes to reduce row count
Use batch processing where available
Upgrade your provider plan for higher rate limits

Capacity Planning for Workflows

Estimating Token Usage

For a workflow processing N rows through an AI node:

Estimated tokens = N × (avg input tokens per row + avg output tokens per row)

Example: An AI Enrichment node adding sentiment analysis to 5,000 customer feedback rows:

Average input: ~200 tokens/row (feedback text + prompt)
Average output: ~50 tokens/row (sentiment + topic)
Total: 5,000 × 250 = 1,250,000 tokens

Reducing Workflow Cost

Strategy	How	Impact
Pre-filter	Add a filter node before the AI node to exclude irrelevant rows	Reduces row count
Conditional AI	Use an AI Conditional node to only process rows that need it	Skips unnecessary AI calls
Batch scheduling	Run large workflows during off-peak hours	Avoids rate limits
Model selection	Use a budget model (GPT-4o-mini) for simple tasks	Reduces cost per token
Prompt optimization	Shorten prompts and remove redundant instructions	Fewer input tokens

Monitoring Capacity

Usage Dashboard

Track real-time and historical token consumption at Settings → Account → Usage & Tokens.

Key metrics to watch:

Daily token consumption — Is usage consistent or spiking?
Top operations — Which features consume the most tokens?
Per-user breakdown — Are specific users driving high consumption?

Warning Signs

Signal	Possible Cause	Action
Sudden usage spike	Runaway workflow, large dataset processing	Check recent workflow runs
Consistently hitting rate limits	Too many concurrent AI operations	Stagger workflows, upgrade provider plan
High cost with low value	Over-processing simple tasks with premium models	Switch to budget models for routine tasks
Slow AI responses	Provider under load or dataset too large	Enable sampling, try a different provider

Best Practices

Start with sampling mode and only enable full processing when needed
Use budget models (GPT-4o-mini, Mistral Small) for workflow enrichment and formatting
Pre-filter workflows to minimize rows sent to AI nodes
Monitor weekly using the Usage Dashboard
Set team guidelines for when to use AI assistant vs. manual analysis
Upgrade provider plans as usage grows rather than hitting rate limits

← PreviousEnable AI (Admin)Next →Access Control

LLM Capacity Management

Data Processing Modes

Sampling Mode (Default)

Full Processing Mode

Rate Limits

Provider-Side Rate Limits

Impact on Workflows

Capacity Planning for Workflows

Estimating Token Usage

Reducing Workflow Cost

Monitoring Capacity

Usage Dashboard

Warning Signs

Best Practices

Related Pages