4 min readUpdated Mar 2, 2026

dataValidation Documentation

Overview

The dataValidation logic is designed to validate incoming data against a predefined set of rules. It performs checks on specified columns of the data and generates two outputs: the original data and an array of error or warning messages indicating any validation issues encountered. This component is essential for ensuring data integrity and quality within the Vantage analytics platform.

Purpose

The main purpose of the dataValidation logic is to validate data to ensure it meets specific criteria before any further processing occurs. This helps identify issues early in the data pipeline, reducing the risk of errors in downstream analytics.

Expected Data

The dataValidation node expects the following input data format:

How It Works

  1. Data Unwrapping: The function begins by extracting the input data, ensuring it is in array format. If the input data is empty or invalid, it defaults to an empty array.

  2. Rule Resolution: The logic determines the source of validation rules based on the rulesSource setting. It can derive rules from manual input, a Google Drive file, or a context snippet.

  3. Validation Checks: For each row in the input data, the validation checks are applied based on the specified rules. Each rule defines a column to check, a validation type (e.g., whether the value is not empty, matches a regex, falls within a specified range, etc.), and an optional severity level (error or warning).

  4. Output Generation: The component generates two outputs:

    • output1: The original unmodified data.
    • output2: An array of validation issues found, including the row number, column name, the rule that failed, an error message, and the severity of the issue.
  5. If the stopOnFirstError setting is enabled, the validation will halt at the first encountered error, returning the original data and the first error found.

Settings

The dataValidation logic includes several settings, each with specific functions and implications:

1. rulesSource

2. rules

3. googleDriveRules

4. contextSnippetId

5. contextSnippetContent

6. stopOnFirstError

Use Cases & Examples

Use Cases

  1. Data Quality Control: Ensuring all critical fields in a dataset contain valid and correctly formatted data before analysis.

  2. Regulatory Compliance: Validating datasets to ensure they adhere to specific business rules and compliance regulations, such as proper email formatting.

  3. Duplicate Value Prevention: Checking for duplicate entries in a dataset to maintain uniqueness across records, such as customer IDs or product SKUs.

Example Configuration

Use Case: Data Quality Control for User Registration

Consider a scenario where a business registers users and must ensure that certain fields are populated and formatted correctly, such as names and email addresses.

Sample Configuration:

json
{
  "rulesSource": "manual",
  "rules": [
    {
      "column": "name",
      "check": "notEmpty",
      "value": "",
      "severity": "error"
    },
    {
      "column": "email",
      "check": "email",
      "value": "",
      "severity": "error"
    }
  ],
  "googleDriveRules": [],
  "contextSnippetId": null,
  "contextSnippetContent": "",
  "stopOnFirstError": false
}

In this configuration: