DataValidationNodeEditor Documentation
Overview
The DataValidationNodeEditor is a custom editor component within the Vantage analytics and data platform. It enables users to define validation rules for data, ensuring data integrity and quality before further processing. Users have the flexibility to create rules manually, import them from Google Drive spreadsheets, or utilize predefined snippets retrieved from a context-based API. The component provides two outputs: the validated data and a list of any errors found during validation.
Purpose
The primary purpose of the DataValidationNodeEditor is to facilitate the configuration of data validation rules that will be applied to incoming datasets. By validating data, users can prevent invalid records from propagating through their workflows, which is crucial for maintaining high data quality.
Settings
1. Rules Source
- Name:
rulesSource - Input Type: Dropdown
- Description: This setting allows the user to select the source of validation rules. The options include:
- Manual: Users define validation rules directly within the interface.
- Google Drive: Users can select rules from a spreadsheet.
- Snippet: Users can apply predefined snippets of rules.
- Default Value:
manual
2. Google Drive Rules
- Name:
googleDriveRules - Input Type: Array of Objects
- Description: This setting stores the rules retrieved from a Google Drive spreadsheet. The expected format for each rule in the spreadsheet includes columns: column name, check type, value (if applicable), and severity level.
- Default Value:
[](empty array)
3. Context Snippet ID
- Name:
contextSnippetId - Input Type: Numeric (ID)
- Description: This setting holds the ID of the selected context snippet from which rules can be parsed.
- Default Value:
null
4. Context Snippet Content
- Name:
contextSnippetContent - Input Type: String (JSON)
- Description: This setting contains the content of the selected context snippet, parsed as JSON, which may define validation rules. It provides flexibility for users to utilize or modify the rules as needed.
- Default Value:
null
5. Validation Rules
- Name:
rules - Input Type: Array of Rule Objects
- Description: This setting contains the list of validation rules defined by the user. Each rule must include:
- column: The name of the column to validate.
- check: The type of validation check to apply (e.g., notEmpty, regex, minLength, etc.).
- value: The value associated with the check (if required).
- severity: The severity of the validation outcome ('error' or 'warning').
- Default Value:
[](empty array)
6. Severity
- Name:
severity - Input Type: Dropdown
- Description: This setting indicates the severity level of the validation check. The options include:
- Error: Indicates a critical validation failure.
- Warning: Indicates a less severe issue that should be noted but does not prevent processing.
- Default Value:
error
How It Works
Upon initialization, the DataValidationNodeEditor will load the upstream columns from connected nodes, providing context for available validation scopes. The user can then choose how to input validation rules (manually, via Google Drive, or via snippets). The component manages state changes and updates to rules through various internal functions:
- addRule: Adds a new validation rule to the list.
- updateRule: Updates a specific rule's property based on user input.
- removeRule: Deletes a specified rule from the list.
- moveRule: Allows reordering of the rules list.
- handleSnippetSelect: Fetches the selected snippet's content and attempts to parse it into validation rules.
The component dynamically updates the validation rules shown in the UI based on the user's selection and actions.
Expected Data
The DataValidationNodeEditor expects the following data structure as input for configuration:
{
"rulesSource": "manual | googleDrive | contextSnippet",
"googleDriveRules": [],
"contextSnippetId": null,
"contextSnippetContent": null,
"rules": [
{
"column": "string",
"check": "string",
"value": "string | number",
"severity": "error | warning"
}
]
}Use Cases & Examples
Use Case 1: Quality Control in Data Ingestion
A company ingests customer data from various sources and needs to ensure that all email addresses provided are valid and correctly formatted. By utilizing the DataValidationNodeEditor, a validation rule can be created where the 'check' type is set to 'regex' to match a standard email pattern.
Use Case 2: Database Uniqueness Enforcement
When importing transaction records, a financial organization must ensure that transaction IDs are unique to prevent duplicate processing. The unique validation check can be employed, ensuring no two transaction records share the same ID.
Example Configuration for Use Case 1
To validate email addresses using a regex pattern, the component would be configured as follows:
{
"rulesSource": "manual",
"googleDriveRules": [],
"contextSnippetId": null,
"contextSnippetContent": null,
"rules": [
{
"column": "email",
"check": "regex",
"value": "^[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}$",
"severity": "error"
}
]
}This configuration will ensure that any invalid email formats are flagged as errors during data processing.
AI Integrations and Billing Impact
The DataValidationNodeEditor includes functionality to fetch validation rule snippets from an AI API endpoint. Integrating this functionality may incur usage-based API costs depending on the number of requests made or data processed through the endpoint.
By maintaining data integrity through effective validation, organizations using Vantage can potentially reduce costs associated with data errors and enhance trust in their analytics outputs.