8 min readUpdated Mar 2, 2026

ETL & Data Pipeline Automation

Vantage replaces fragile cron jobs and hand-coded ETL scripts with visual, auditable workflows that anyone on the team can build and maintain. Connect to any supported database, pull data on a schedule, transform it with built-in nodes, and push results to dashboards, files, or downstream systems — all without writing code.


Automate a Scheduled Database-to-Dashboard Pipeline

Pull data from a production database every morning, aggregate it, and display results on a live dashboard.

Scenario: A regional sales team needs daily revenue figures broken out by territory, refreshed before their 9 AM standup.

Workflow Steps:

  1. Schedule Trigger — Run daily at 7:30 AM (30 minutes before standup)
  2. Database Query (PostgreSQL) — Execute a parameterized query against the orders table: SELECT region, product_line, SUM(amount) as revenue, COUNT(*) as order_count FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '1 day' GROUP BY region, product_line
  3. Aggregation — Roll up by region: total revenue, average order value, order count
  4. Sort — Descending by total revenue
  5. Computed Column — Calculate day-over-day change: ((today_revenue - yesterday_revenue) / yesterday_revenue) * 100
  6. Filter — Only regions with > $5,000 in daily revenue (suppress noise from low-volume territories)
  7. Dashboard Output — Push to three tiles:
    • Bar Tile — Revenue by region
    • Metric Tile — Total company revenue with sparkline
    • Table Tile — Detailed region × product breakdown

Key Nodes: Schedule Trigger, Database Query (PostgreSQL), Aggregation, Sort, Computed Column, Filter, Dashboard Output


Consolidate Data from Multiple Sources into One View

Merge data from multiple databases into a single unified view.

Scenario: A company runs its CRM on MSSQL, its e-commerce platform on MySQL, and its support tickets on MongoDB. Leadership wants a single customer health dashboard.

Workflow Steps:

  1. Schedule Trigger — Run every 4 hours
  2. Database Query (MSSQL) — Pull customer records with lifetime value, last contact date, and assigned rep
  3. Database Query (MySQL) — Pull order history with most recent order date and total spend
  4. Database Query (MongoDB) — Pull support ticket counts, average resolution time, and open ticket count per customer
  5. Join — Inner join all three datasets on customer email (primary key)
  6. Computed Column — Calculate a customer health score: (recency_score * 0.3) + (frequency_score * 0.3) + (support_score * 0.4)
  7. Data Validation — Flag rows with null emails, negative spend, or impossible dates
  8. Multi-Conditional — Route by health score:
    • Score ≥ 80 → Dashboard Output (healthy customers)
    • Score 50–79 → Dashboard Output (at-risk list) + Send Email to account manager
    • Score < 50 → Send Email (urgent) to VP of Customer Success + Dashboard Output (critical alerts)
  9. Dashboard Output — Populate:
    • Pie Tile — Customer distribution by health tier
    • Table Tile — Full customer detail with sortable columns
    • Metric Tile — Average health score with trend

Key Nodes: Schedule Trigger, Database Query (MSSQL, MySQL, MongoDB), Join, Computed Column, Data Validation, Multi-Conditional, Dashboard Output, Send Email


Generate and Distribute Reports on Autopilot

Generate formatted reports from database data and distribute them via email and file storage.

Scenario: Finance needs a weekly P&L summary emailed to the executive team as both a PDF attachment and an Excel workbook, with a copy archived to cloud storage.

Workflow Steps:

  1. Schedule Trigger — Run every Monday at 6 AM
  2. Database Query (PostgreSQL) — Pull revenue, COGS, and expense line items for the prior week
  3. Aggregation — Sum by department and expense category
  4. Computed Column — Calculate gross margin, operating margin, and net income
  5. Sort — By department, then by expense category
  6. Projection — Select only the columns needed for the report (remove internal IDs, timestamps)
  7. Write PDF — Generate a formatted P&L report with headers, totals row, and date range
  8. Write Excel — Generate a workbook with two sheets: Summary and Detail
  9. Send Email (Gmail) — Email the PDF and Excel to the distribution list with a formatted subject line: Weekly P&L Report — Week of {date}
  10. Write File (Google Drive) — Archive both files to the shared Finance folder

Key Nodes: Schedule Trigger, Database Query, Aggregation, Computed Column, Sort, Projection, Write PDF, Write Excel, Send Email, Write File


Monitor Data Quality in Real Time

Continuously monitor incoming data for anomalies and quality issues.

Scenario: A data team wants to catch data quality problems within minutes — not days — before they corrupt downstream reports.

Workflow Steps:

  1. Schedule Trigger — Run every 15 minutes
  2. Database Query (PostgreSQL) — Pull the most recent batch of records from the staging table
  3. Data Validation — Apply quality rules:
    • Required fields: customer_id, email, order_date (no nulls)
    • Format checks: email matches regex, dates are valid ISO-8601
    • Range checks: amount > 0, quantity BETWEEN 1 AND 10000
  4. Filter — Separate passing and failing records
  5. Multi-Conditional — Route by failure type:
    • Missing required fields → DB Write (quarantine table) + Send Message (Slack to #data-quality channel)
    • Format violations → DB Write (quarantine table) + Dashboard Output (Event Feed Tile with violation detail)
    • Range anomalies → AI Enrichment (classify probable cause: data entry error, system bug, fraud) → Send Email to data engineering lead
    • All passed → DB Write (production table)
  6. Dashboard Output — Populate:
    • Metric Tile — Pass rate % with trend line
    • Event Feed Tile — Real-time log of validation failures
    • Bar Tile — Failures by category over time
    • Stat Tile — Records processed in last 24 hours

Key Nodes: Schedule Trigger, Database Query, Data Validation, Filter, Multi-Conditional, DB Write, AI Enrichment, Send Message (Slack), Send Email, Dashboard Output


Example Dashboard: Data Operations Command Center

Build this dashboard to give your data team complete visibility into pipeline health, data quality, and delivery status.

Row 1 — Key Metrics

TileNameWhat It Shows
MetricPipelines ActiveCount of enabled workflows with sparkline showing 30-day trend
MetricRecords Processed TodayTotal rows ingested, transformed, and loaded with comparison to yesterday
MetricData Quality ScoreWeighted average pass rate across all validation checks (target: ≥ 99%)
StatFailed Runs (24h)Count of workflow failures with red/green status indicator

Row 2 — Pipeline Health

TileNameWhat It Shows
Event FeedPipeline Activity LogReal-time feed of workflow executions showing status (success, warning, failure), duration, and records processed. Color-coded: green = success, yellow = slow, red = failure
LineProcessing Volume TrendRecords processed per hour over the last 7 days with day-of-week overlay. Helps identify volume patterns and anomalies

Row 3 — Data Quality & Delivery

TileNameWhat It Shows
BarValidation Results by SourceStacked bar showing pass/fail/warning counts per data source (PostgreSQL, MySQL, MongoDB, APIs). Quickly reveals which sources have the most quality issues
TableData Quality Exception QueueAll records that failed validation — columns: source, timestamp, failure type, severity, assigned to, status. Sortable and filterable for analyst triage

Row 4 — Scheduling & History

TileNameWhat It Shows
GanttPipeline ScheduleVisual timeline showing when each pipeline runs, duration, and overlap. Helps prevent resource contention during peak processing windows
ComparisonWeek-over-Week PerformanceSide-by-side comparison of this week vs. last week: total records, processing time, error rate, data quality score
Tip

Data Sources: Connect via Database Query nodes to your PostgreSQL, MySQL, MSSQL, or MongoDB databases. Use Schedule Trigger to refresh tiles on cadence (every 15 minutes for metrics, hourly for trends).


Getting Started

To build your first ETL pipeline:

  1. Connect a database — Go to Integrations and add your PostgreSQL, MySQL, MSSQL, or MongoDB connection
  2. Create a workflow — Start with a Schedule Trigger and add a Database Query node
  3. Transform your data — Add Aggregation, Filter, Computed Column, or Join nodes as needed
  4. Output results — Use Dashboard Output to push to tiles, or Write CSV / Write Excel / Write PDF to generate files
  5. Activate — Enable the schedule and monitor execution history in the workflow detail view