UnionNodeEditor Documentation
Overview
The UnionNodeEditor is a specialized component within the Vantage analytics platform used to merge two datasets, allowing users to define how data from the second input should be combined with the first. This editor facilitates the mapping of columns from Input 2 to the corresponding columns in Input 1, ensuring that data correctly aligns when combined. Users can also choose various settings related to how the data is processed, such as keeping duplicates or applying sorting.
Purpose
The primary function of the UnionNodeEditor is to provide users the flexibility to combine datasets efficiently while maintaining control over which columns are merged, whether duplicates are allowed, and how the resulting data should be sorted. This helps in data preparation tasks, particularly in ETL (Extract, Transform, Load) processes.
Settings
1. Union Mode
- Name:
unionMode - Input Type: Dropdown (String)
- Description: Determines how rows from the two datasets are handled.
- Options:
'all'(keep all rows, including duplicates), or'distinct'(remove duplicate rows).
- Options:
- Default Value:
'all' - Effect: Changes the behavior of the union operation. Selecting
'distinct'requires specifying deduplication keys.
2. Deduplicate Keys
- Name:
deduplicateKeys - Input Type: Array of Strings
- Description: Defines which columns will be used to determine uniqueness in distinct mode. Specifying columns will mean that rows with identical values in these columns will be treated as duplicates.
- Default Value:
[](empty array) - Effect: Only applicable when
unionModeis set to'distinct'. It influences which rows are included in the resulting dataset.
3. Sort Column
- Name:
sortColumn - Input Type: String
- Description: Specifies the column by which the combined dataset should be sorted.
- Default Value:
''(empty string) - Effect: Defines the order of the output data. An empty value means that no specific sorting is applied.
4. Sort Direction
- Name:
sortDirection - Input Type: Dropdown (String)
- Description: Indicates the direction of the sort applied to the output.
- Options:
'asc'(ascending),'desc'(descending).
- Options:
- Default Value:
'asc' - Effect: Changing this modifies the order in which the output rows appear based on the specified sort column.
5. Column Mappings
- Name:
columnMappings - Input Type: Object (Mapping of Strings)
- Description: Maps columns from Input 2 to Input 1. This ensures that data from Input 2 is merged into the correct destination columns in Input 1.
- Default Value:
{}(empty object) - Effect: Allows customization of how columns are aligned between the two datasets. Changes affect the final structure of the output data.
6. Selected Columns
- Name:
selectedColumns - Input Type: Array of Strings
- Description: Specifies which columns from Input 2 should be included in the final dataset. If not defined, all columns from Input 2 are included.
- Default Value:
undefined(implicitly includes all selected columns) - Effect: Affects which data is merged from Input 2 into the output.
How It Works
The UnionNodeEditor operates by leveraging systems of memoization to analyze and update data reactively. When configured settings are changed, the component re-evaluates the previews of both input datasets as well as the output dataset. This allows users to visualize how their configuration will affect the combined output before the actual operation is executed.
The component fetches the most recent preview results from a shared context, analyzes the upstream input datasets, and exposes functionality for mapping columns, selecting which columns to include in the output, toggling deduplication options, and applying sorting.
Data Expectations
The UnionNodeEditor expects two input datasets, referred to as Input 1 and Input 2. Each dataset should ideally consist of rows formatted as objects with consistent column names, allowing for accurate merging and mapping. The editor requires an understanding of which columns should correspond across datasets when merging.
Use Cases & Examples
Use Case 1: Merging Sales Data from Two Regions
A company operates in two regions and collects sales data separately. The UnionNodeEditor can be used to merge these datasets while maintaining the option to remove duplicate entries for the same product.
Use Case 2: Combining Customer Feedback from Different Sources
A business collects customer feedback from multiple platforms. By using the UnionNodeEditor, they can combine this data into a single dataset to conduct a comprehensive analysis.
Example Configuration
Use Case: A company merges sales data from two regions while keeping all entries to analyze total sales.
Sample Configuration Data
{
"unionMode": "all",
"deduplicateKeys": [],
"sortColumn": "saleDate",
"sortDirection": "asc",
"columnMappings": {
"salesAmount": "amount",
"productID": "product_id"
},
"selectedColumns": ["salesAmount", "productID"]
}In this example:
unionModeis set to"all"to retain all rows.- No deduplication keys are provided as duplicates are acceptable.
- Sorting is applied based on the
saleDatein ascending order. - Specific columns from Input 2 (
salesAmountandproductID) are mapped to their counterparts in Input 1 (amountandproduct_id) to ensure correct alignment in the result.