4 min readUpdated Mar 2, 2026

LLM Capacity Management

Managing AI capacity ensures your organization gets reliable, cost-effective AI performance. This guide covers data processing settings, rate limits, and strategies for high-volume usage.


Data Processing Modes

Vantage offers two modes for how AI processes data, controlled by the Process Large Datasets setting in Settings → AI Features → Query Settings.

Sampling Mode (Default)

PropertyValue
SettingProcess Large Datasets: OFF
BehaviorAI processes a representative sample of rows
SpeedFast
Token costLower
AccuracyHigh for trend analysis; may miss edge cases
Best forDashboards, quick insights, most daily use

Full Processing Mode

PropertyValue
SettingProcess Large Datasets: ON
BehaviorAI processes all rows in the dataset
SpeedSlower (proportional to data size)
Token costHigher
AccuracyComplete — no data is skipped
Best forCompliance audits, financial reporting, regulatory tasks

Recommendation: Start with sampling mode. Switch to full processing only for specific use cases that require 100% data coverage.


Rate Limits

Provider-Side Rate Limits

Each AI provider enforces their own rate limits (requests per minute, tokens per minute). When Vantage hits a rate limit, it:

  1. Detects the 429 Rate Limited response
  2. Waits with exponential backoff
  3. Retries the request automatically

Common provider limits:

ProviderTypical RPM (Requests/min)Typical TPM (Tokens/min)
OpenAI500–10,000 (varies by tier)30K–300K
Claude50–4,00020K–100K
Gemini60–1,000Varies
Mistral120–1,000Varies

Note: Limits depend on your provider plan. Enterprise plans have significantly higher limits.

Impact on Workflows

High-volume workflows (processing thousands of rows through AI nodes) are the most likely to hit rate limits. Strategies:


Capacity Planning for Workflows

Estimating Token Usage

For a workflow processing N rows through an AI node:

Estimated tokens = N × (avg input tokens per row + avg output tokens per row)

Example: An AI Enrichment node adding sentiment analysis to 5,000 customer feedback rows:

Reducing Workflow Cost

StrategyHowImpact
Pre-filterAdd a filter node before the AI node to exclude irrelevant rowsReduces row count
Conditional AIUse an AI Conditional node to only process rows that need itSkips unnecessary AI calls
Batch schedulingRun large workflows during off-peak hoursAvoids rate limits
Model selectionUse a budget model (GPT-4o-mini) for simple tasksReduces cost per token
Prompt optimizationShorten prompts and remove redundant instructionsFewer input tokens

Monitoring Capacity

Usage Dashboard

Track real-time and historical token consumption at Settings → Account → Usage & Tokens.

Key metrics to watch:

Warning Signs

SignalPossible CauseAction
Sudden usage spikeRunaway workflow, large dataset processingCheck recent workflow runs
Consistently hitting rate limitsToo many concurrent AI operationsStagger workflows, upgrade provider plan
High cost with low valueOver-processing simple tasks with premium modelsSwitch to budget models for routine tasks
Slow AI responsesProvider under load or dataset too largeEnable sampling, try a different provider

Best Practices

  1. Start with sampling mode and only enable full processing when needed
  2. Use budget models (GPT-4o-mini, Mistral Small) for workflow enrichment and formatting
  3. Pre-filter workflows to minimize rows sent to AI nodes
  4. Monitor weekly using the Usage Dashboard
  5. Set team guidelines for when to use AI assistant vs. manual analysis
  6. Upgrade provider plans as usage grows rather than hitting rate limits