4 min read

Union

Overview

The "Union" node in the Vantage analytics platform is designed to combine two datasets into a single dataset. It can be utilized in various scenarios, whether to append data from two sources directly or to ensure that the final dataset contains only unique entries based on specified criteria. This powerful component allows users to manipulate and refine their data processing effortlessly, making it an essential tool for analytics tasks.

Settings

The following settings configure the behavior of the union operation:

1. `unionMode`

Input Type: Dropdown ('all' | 'distinct')
Description: This setting determines how the rows from both datasets are combined.
- 'all' (default): Every row from both datasets is included in the output.
- 'distinct': Only unique rows are included in the output, removing duplicates.
Default Value: 'all'

2. `deduplicateKeys`

Input Type: Array of strings (string[])
Description: Specifies which columns should be used to determine uniqueness when unionMode is set to 'distinct'.
- If this array is populated, only the values in these columns are compared for duplicate entries. If it's empty or not provided, all columns are analyzed for uniqueness.
Default Value: [] (empty array)

3. `sortColumn`

Input Type: String
Description: This defines the column by which the output dataset should be sorted.
- If set, the merged dataset will be organized based on the values in this column.
Default Value: '' (empty string, which means no sorting is applied)

4. `sortDirection`

Input Type: Dropdown ('asc' | 'desc')
Description: Specifies the direction of the sorting applied to the dataset.
- 'asc': Ascending order (default).
- 'desc': Descending order.
Default Value: 'asc'

5. `columnMappings`

Input Type: Object ({ [input2Col]: input1Col | null })
Description: This allows the user to map column names from Input 2 to Input 1.
- If a specific column from Input 2 is mapped to an Input 1 column, it will be renamed in the final output.
- If the mapping value is null or not provided, the original name of the column from Input 2 is retained.
Default Value: {} (empty object)

6. `selectedColumns`

Input Type: Array of strings (string[] | undefined)
Description: If specified, only the columns listed will be included from Input 2 in the final output.
- This allows for selective merging based on the user's needs. If not provided, all columns from Input 2 are included, maintaining backward compatibility.
Default Value: undefined

How It Works

Input Handling: The component first unwraps the provided datasets (input1 and input2). If the inputs are invalid (not arrays), they are initialized as empty arrays.
Column Mapping: If column mappings are defined, Input 2’s columns are modified according to the mappings before any other operations.
Column Selection: If selectedColumns are provided, Input 2 is filtered to keep only those columns in the final output.
Data Combining: Input 1 and the processed Input 2 are combined into one dataset.
Duplicate Handling: If unionMode is set to 'distinct', the component identifies and removes duplicate rows based on deduplicateKeys or all columns, depending on the setting.
Sorting: If sortColumn is defined, the resultant data is sorted according to its values, using the specified sortDirection.
Output Generation: The final combined dataset is prepared as output and, if preview mode is active, includes metadata about the initial row counts from both datasets.

Use Cases & Examples

Use Case 1: Customer Data Integration

A business may have customer data split between two databases. Using the Union logic, they can consolidate data from both sources into a unified dataset for analytics or reporting purposes.

Use Case 2: Event Log Simplification

An organization collecting logs from two different systems could use this component to merge events into a single dataset for troubleshooting and analysis, ensuring only unique events are retained to simplify reporting.

Use Case 3: Financial Records Aggregation

A finance department could combine monthly financial records stored in two separate files to produce a comprehensive view, facilitating reconciliations and summaries.

Example Configuration

Assuming a tech company wants to merge employee records from HR and Payroll systems while ensuring unique entries based on email addresses and sorted by hire dates:

json

{
  "unionMode": "distinct",
  "deduplicateKeys": ["email"],
  "sortColumn": "hireDate",
  "sortDirection": "asc",
  "columnMappings": {
    "name": "fullName",
    "employeeId": null
  },
  "selectedColumns": ["email", "fullName", "hireDate"]
}

In this configuration:

All duplicates based on the email address will be removed.
The resulting dataset will be sorted in ascending order based on the hire date.
The name column from Input 2 will be renamed to fullName, while employeeId will retain its name.
Only the specified columns will be included in the final output.

← PreviousJoin Next →Limit

Union

Overview

Settings

1. unionMode

2. deduplicateKeys

3. sortColumn

4. sortDirection

5. columnMappings

6. selectedColumns