4 min readUpdated Mar 2, 2026

DeduplicateNodeEditor Documentation

Overview

The DeduplicateNodeEditor is a custom editor component within the Vantage analytics and data platform. It provides users with the ability to deduplicate data based on selected key columns and a specified keep strategy. This component automatically detects columns from shared preview results, allowing users to manage duplicates effectively based on their business logic.

Purpose

The primary purpose of the DeduplicateNodeEditor is to facilitate the removal of duplicate data entries in datasets by allowing users to specify which columns define uniqueness and how to handle multiple entries. It ensures data integrity and simplifies data processing workflows by minimizing redundancy.

Expected Data

The DeduplicateNodeEditor expects the following data:

Settings

Keep Strategy

  1. Setting Name: keepStrategy
  2. Input Type: Dropdown (string)
  3. Description: This setting defines the strategy for retaining entries when duplicates are found. Users can choose between:
    • "Keep First Occurrence": Retains the first instance of the duplicate entries.
    • "Keep Last Occurrence": Retains the last instance of the duplicate entries. Changing this affects the deduplication outcome; for example, choosing "last" may result in different data retention in scenarios where timeliness is a factor.
  4. Default Value: "first"

Key Columns

  1. Setting Name: keyColumns
  2. Input Type: Array of strings (checkboxes)
  3. Description: This setting allows users to select specific columns that determine uniqueness in the data. If no columns are selected, all columns will be considered for deduplication. This directly impacts the deduplication process by changing the criteria used to identify duplicates.
  4. Default Value: [] (an empty array, meaning all columns will be used if none are manually selected).

Use Cases & Examples

Use Cases

  1. Data Cleaning for Marketing Campaigns: A marketing team may have a customer list with duplicate entries due to multiple sign-ups. By utilizing the DeduplicateNodeEditor, they can ensure each customer is represented only once based on their email address while choosing to keep either the first or last entry based on their most recent interactions.
  2. Customer Relationship Management (CRM) Systems: In a CRM system, duplicates can lead to erroneous reporting and communication. The DeduplicateNodeEditor can help sales teams maintain an accurate dataset by defining key identifiers like phone numbers or customer IDs to deduplicate records efficiently.
  3. Inventory Management: Retail businesses often face duplicate inventory entries. With this component, they can identify and consolidate duplicate product listings based on unique product SKUs.

Example Configuration

Scenario

A retail company is cleaning up their inventory data to ensure there are no duplicate products listed for sale, potentially affecting their sales reports and inventory counting.

Configuration Steps

  1. Select Key Columns: They choose to set keyColumns to [ "SKU", "ProductName" ] since both are crucial for identifying unique product entries.
  2. Set Keep Strategy: They opt for keepStrategy set to "last" to ensure the most recent data entry is retained.
json
{
  "keyColumns": ["SKU", "ProductName"],
  "keepStrategy": "last"
}

In this configuration, the DeduplicateNodeEditor would remove duplicates based on both SKU and Product Name, keeping the most recently added product entry, thus aligning inventory data with sales reality efficiently.