Documentation for join Logic
Purpose
The join logic in the Vantage analytics & data platform is designed to combine datasets based on common attributes, allowing users to merge data streams from various sources for integrated analysis. Its primary function is to match and merge records from a left dataset with those from a right dataset, using specified keys to find corresponding rows.
How It Works
The join function operates by accepting parameters that define the left and right datasets and the keys used for the join operation. It facilitates inner joins predominantly, where records are created only when matching keys are found in both datasets.
-
Input Parameters:
rightDataset: an array of records that represents the right dataset.leftKey: a string that indicates the key in the left dataset used for matching.rightKey: a string that indicates the key in the right dataset used for matching.
-
Execution:
- The
executemethod takes in the left dataset, iterates through each record of both datasets, and creates a new record combining matching pairs based on the keys provided.
- The
Settings
1. rightDataset
- Input Type: Array
- Description: This setting holds the dataset that will be joined against the left dataset. It is expected to be an array of objects where each object represents a row of data.
- Behavior: Failing to provide a valid array will result in an error being thrown, preventing the join operation from proceeding.
- Default Value: None (required).
2. leftKey
- Input Type: String
- Description: This setting specifies the key from the left dataset which will be used to match with the corresponding key from the right dataset.
- Behavior: If an invalid key is supplied or left blank, an error will be thrown during execution. It directly influences which field is evaluated for matches against the right dataset.
- Default Value: None (required).
3. rightKey
- Input Type: String
- Description: This setting denotes the key from the right dataset used for matching against the key from the left dataset.
- Behavior: Similar to
leftKey, if this key is either blank or invalid, an error will be issued, dictating that successful joins depend on valid keys. - Default Value: None (required).
Expected Data
The join logic requires two datasets:
- Left Dataset: An array of objects representing records to be joined, where each object should contain the
leftKey. - Right Dataset: An array of objects representing the records to join with the left dataset, containing the
rightKey.
Both datasets must be structured correctly for the join to proceed without errors.
Use Cases & Examples
Use Case 1: Customer Data Integration
A company may wish to analyze its sales data (left dataset) alongside customer information (right dataset) to assess the performance of sales across different customer demographics. By using a join on customer ID, the company can enrich sales data with relevant customer attributes.
Use Case 2: Historical Data Analysis
Organization A needs to merge its historical sales records (left dataset) with the corresponding product information (right dataset) to generate comprehensive reports that analyze product sales performance over time.
Use Case 3: Survey and Response Matching
An analytics team may receive survey responses (the left dataset) and wish to combine these with demographic data (the right dataset) to analyze feedback in contextual categories, utilizing respondent IDs as keys.
Configuration Example
Business Use Case: Customer Sales Analysis
To analyze customer sales data effectively, we want to join the sales dataset with the customers dataset.
Sample Datasets:
// Sales Data (Left Dataset)
const salesData = [
{ saleId: 1, customerId: 'C001', amount: 250 },
{ saleId: 2, customerId: 'C002', amount: 150 },
{ saleId: 3, customerId: 'C001', amount: 200 }
];
// Customers Data (Right Dataset)
const customersData = [
{ customerId: 'C001', name: 'Alice' },
{ customerId: 'C002', name: 'Bob' },
{ customerId: 'C003', name: 'Charlie' }
];Configuration:
const joinNode = createJoinNode({
leftDataset: salesData,
rightDataset: customersData,
leftKey: 'customerId',
rightKey: 'customerId'
});
// Executing join
const mergedData = joinNode.execute(salesData);
console.log(mergedData);Result:
The output will be:
[
{ saleId: 1, customerId: 'C001', amount: 250, name: 'Alice' },
{ saleId: 2, customerId: 'C002', amount: 150, name: 'Bob' },
{ saleId: 3, customerId: 'C001', amount: 200, name: 'Alice' }
]Integration and Billing Impact
The join function may integrate with other AI features in Vantage, particularly in data preprocessing or analytics pipelines where combined datasets require analysis with machine learning models.
Billing for the use of the join function may vary based on data volume being processed and frequency of operations, as merging large datasets could incur higher computational costs. It is advisable to monitor usage metrics and adjust dataset sizes accordingly to optimize performance and manage costs effectively.