Union Logic Documentation
Overview
The "Union" logic component in the Vantage analytics platform is designed to combine two datasets into a single dataset. It can be utilized in various scenarios, whether to append data from two sources directly or to ensure that the final dataset contains only unique entries based on specified criteria. This powerful component allows users to manipulate and refine their data processing effortlessly, making it an essential tool for analytics tasks.
Settings
The following settings configure the behavior of the union operation:
1. unionMode
- Input Type: Dropdown (
'all' | 'distinct') - Description: This setting determines how the rows from both datasets are combined.
'all'(default): Every row from both datasets is included in the output.'distinct': Only unique rows are included in the output, removing duplicates.
- Default Value:
'all'
2. deduplicateKeys
- Input Type: Array of strings (
string[]) - Description: Specifies which columns should be used to determine uniqueness when
unionModeis set to'distinct'.- If this array is populated, only the values in these columns are compared for duplicate entries. If it's empty or not provided, all columns are analyzed for uniqueness.
- Default Value:
[](empty array)
3. sortColumn
- Input Type: String
- Description: This defines the column by which the output dataset should be sorted.
- If set, the merged dataset will be organized based on the values in this column.
- Default Value:
''(empty string, which means no sorting is applied)
4. sortDirection
- Input Type: Dropdown (
'asc' | 'desc') - Description: Specifies the direction of the sorting applied to the dataset.
'asc': Ascending order (default).'desc': Descending order.
- Default Value:
'asc'
5. columnMappings
- Input Type: Object (
{ [input2Col]: input1Col | null }) - Description: This allows the user to map column names from Input 2 to Input 1.
- If a specific column from Input 2 is mapped to an Input 1 column, it will be renamed in the final output.
- If the mapping value is
nullor not provided, the original name of the column from Input 2 is retained.
- Default Value:
{}(empty object)
6. selectedColumns
- Input Type: Array of strings (
string[] | undefined) - Description: If specified, only the columns listed will be included from Input 2 in the final output.
- This allows for selective merging based on the user's needs. If not provided, all columns from Input 2 are included, maintaining backward compatibility.
- Default Value:
undefined
How It Works
-
Input Handling: The component first unwraps the provided datasets (
input1andinput2). If the inputs are invalid (not arrays), they are initialized as empty arrays. -
Column Mapping: If column mappings are defined, Input 2’s columns are modified according to the mappings before any other operations.
-
Column Selection: If
selectedColumnsare provided, Input 2 is filtered to keep only those columns in the final output. -
Data Combining: Input 1 and the processed Input 2 are combined into one dataset.
-
Duplicate Handling: If
unionModeis set to'distinct', the component identifies and removes duplicate rows based ondeduplicateKeysor all columns, depending on the setting. -
Sorting: If
sortColumnis defined, the resultant data is sorted according to its values, using the specifiedsortDirection. -
Output Generation: The final combined dataset is prepared as output and, if preview mode is active, includes metadata about the initial row counts from both datasets.
Use Cases & Examples
Use Case 1: Customer Data Integration
A business may have customer data split between two databases. Using the Union logic, they can consolidate data from both sources into a unified dataset for analytics or reporting purposes.
Use Case 2: Event Log Simplification
An organization collecting logs from two different systems could use this component to merge events into a single dataset for troubleshooting and analysis, ensuring only unique events are retained to simplify reporting.
Use Case 3: Financial Records Aggregation
A finance department could combine monthly financial records stored in two separate files to produce a comprehensive view, facilitating reconciliations and summaries.
Example Configuration
Assuming a tech company wants to merge employee records from HR and Payroll systems while ensuring unique entries based on email addresses and sorted by hire dates:
{
"unionMode": "distinct",
"deduplicateKeys": ["email"],
"sortColumn": "hireDate",
"sortDirection": "asc",
"columnMappings": {
"name": "fullName",
"employeeId": null
},
"selectedColumns": ["email", "fullName", "hireDate"]
}In this configuration:
- All duplicates based on the email address will be removed.
- The resulting dataset will be sorted in ascending order based on the hire date.
- The
namecolumn from Input 2 will be renamed tofullName, whileemployeeIdwill retain its name. - Only the specified columns will be included in the final output.