DataClassifyNodeEditor
Overview
The DataClassifyNodeEditor is a custom editor component within the Vantage analytics & data platform. It allows users to define classification rules that assign category labels to data rows based on specified conditions. The rules are evaluated in a top-to-bottom order, meaning that the first rule that matches a row will determine the label assigned to that row. This functionality is integral to data categorization, enabling users to refine data insights and reporting based on specific criteria.
Purpose
The purpose of the DataClassifyNodeEditor is to facilitate user-defined categorization of data based on specific logic and conditions. Users can configure rules using various comparison operators and can specify default labels to apply when no rules match.
Settings
1. Output Column Name
- Name: Output Column
- Input Type: String
- Description: This field allows users to specify the name of the output column that will contain the category labels after classification. Changing this value directly affects where the data categories will be stored in the resulting dataset.
- Default Value: 'category'
2. Default Label
- Name: Default Label
- Input Type: String
- Description: This field allows users to define a label that will be assigned to rows that do not match any of the classification rules. If no rules apply to a given row, this label is used, ensuring that every entry in the output column has a defined value.
- Default Value: 'Other'
3. Classification Rules
- Name: Classification Rules
- Input Type: Array of Rules
- Description: Users can create an array of classification rules that consist of the following components:
- Column: The column from the input data that the rule will evaluate.
- Operator: The comparison operation to apply (e.g., "equals", "contains").
- Value: The value against which the column data is compared.
- Label: The category label assigned if the rule matches a row.
Operators
Users can select from the following operators, each of which has its own functionality:
- equals: Checks if the value matches exactly.
- notEquals: Checks if the value does not match.
- contains: Checks if the string contains specified text.
- notContains: Checks if the string does not contain specified text.
- startsWith: Checks if the string starts with specified prefix.
- endsWith: Checks if the string ends with specified suffix.
- greaterThan: Checks if the numeric value is greater than specified.
- lessThan: Checks if the numeric value is less than specified.
- greaterThanOrEqual: Checks if the numeric value is greater than or equal to specified.
- lessThanOrEqual: Checks if the numeric value is less than or equal to specified.
- isEmpty: Checks if the column is empty (no value).
- isNotEmpty: Checks if the column has a value.
- regex: Checks if the value matches a regular expression pattern.
- inList: Checks if the value is one of the specified items in a list.
Example Settings Configuration:
{
"outputColumn": "category",
"defaultLabel": "Other",
"rules": [
{
"column": "age",
"operator": "greaterThan",
"value": "18",
"label": "Adult"
},
{
"column": "age",
"operator": "lessThan",
"value": "18",
"label": "Minor"
},
{
"column": "status",
"operator": "contains",
"value": "active",
"label": "Active User"
}
]
}How It Works
The DataClassifyNodeEditor component utilizes hooks for managing state and dependencies, specifically:
useUpstreamColumns: This hook retrieves column names from upstream nodes, informing the user which columns they can use in their classification rules.useTooltips: This hook manages the state for displaying tooltips alongside configuration fields that provide further information.
The rules can be dynamically added, updated, and removed, and the component provides user interactivity for moving rules, making it intuitive to manage complex classification scenarios.
AI Integrations
While the direct code does not reference any AI integration, the DataClassifyNodeEditor can be combined with AI-powered features in Vantage, like machine learning models that might recommend rules based on existing data patterns or analyze data to create default labels intelligently.
Billing Impacts
The use of DataClassifyNodeEditor does not imply direct billing impacts. However, extensive data processing involving multiple classification rules may have associated costs if using cloud-based processing resources. Customers should be aware of potential charges for data handling when deploying complex classification workflows.
Use Cases & Examples
Use Case 1: Customer Segmentation
Businesses can use the DataClassifyNodeEditor to segment their customer base into categories such as "Active", "Inactive", "Potential", and "High Value". This helps tailor marketing efforts and increase conversion rates.
Use Case 2: Product Categorization
E-commerce platforms can apply classification rules to categorize products based on their attributes, such as "Electronics", "Clothing", or "Home Goods". This enhances user navigation and improves search effectiveness.
Use Case 3: Compliance Monitoring
Organizations can implement the DataClassifyNodeEditor to monitor compliance-related data elements, tagging records that meet specific regulatory conditions, which can help in risk assessment and reporting.
Detailed Example
For the customer segmentation use case, the DataClassifyNodeEditor might be configured as follows to classify customers:
{
"outputColumn": "customer_segment",
"defaultLabel": "Unspecified",
"rules": [
{
"column": "last_purchase_date",
"operator": "greaterThan",
"value": "2022-01-01",
"label": "Active"
},
{
"column": "last_purchase_date",
"operator": "lessThan",
"value": "2022-01-01",
"label": "Inactive"
},
{
"column": "purchase_amount",
"operator": "greaterThan",
"value": "1000",
"label": "High Value"
}
]
}This configuration will classify customers based on their last purchase date and total purchase amount, enabling effective marketing strategies for each segment defined.