5 min read

dataClassify Documentation

Purpose

This node is designed to assign category labels to each row of data based on specific conditional rules. It evaluates a set of predefined conditions on various columns of the input data, where the first rule that matches a condition triggers the assignment of a specified label to the output column. If none of the rules match for a row, the logic assigns a default label.

Expected Data

This node expects a structured input in the form of an array of objects, with each object representing a data row that may contain various fields. The component processes these data rows and requires the data to be compatible in terms of the column names specified in the rules defined within the configuration.

Settings

This node accepts the following configuration settings:

1. outputColumn

Input Type: String
Description: Specifies the name of the output column where the category labels will be stored. This setting determines where in the resulting data structure the new labels will be added.
Effect of Changing: If you change this value, the name of the output column in the returned dataset will reflect this change. For instance, setting it to "categoryLabel" will output data with a column named "categoryLabel" instead of the default "category".
Default Value: "category"

2. defaultLabel

Input Type: String
Description: This setting determines the label assigned to rows that do not match any of the predefined rules. If there are unmatched rows, this label will indicate their classification.
Effect of Changing: Modifying this value changes the fallback label for any rows that do not meet any conditions specified in the rules. For example, changing it to "Unclassified" will result in unmatched rows being labeled as "Unclassified".
Default Value: "Other"

3. rules

Input Type: Array of Objects
Description: This is an array where each object defines a rule for classifying the data. Each rule must contain specific attributes including:
- column: The name of the column that the rule applies to.
- operator: The condition that will be evaluated (e.g., greater than, equals).
- value: The comparison value against which the column’s value will be evaluated.
- label: The category label that will be assigned to the row if it matches the condition.
Effect of Changing: Adjusting the rules alters how the input data is categorized. Adding a new rule allows for more refined classification, while removing rules could generalize results, potentially increasing the default label's usage.
Default Value: Empty array [].

4. Conditions within Rules

The individual conditions that can be applied via the operator property in each rule include:

equals, notEquals, contains, notContains, startsWith, endsWith
greaterThan, lessThan, greaterThanOrEqual, lessThanOrEqual
isEmpty, isNotEmpty, regex, inList

These operators provide a range of comparisons for categorizing data, and their effects depend on the data type and value of the column being evaluated.

How It Works

The node first unwraps the incoming data to ensure it is in the correct format.
It then initializes results by iterating over each data row.
For each row, it evaluates rules in order:
- If a rule matches, it assigns the corresponding label to the outputColumn.
- If a row does not match any rule, the defaultLabel is assigned.
Results are compiled into a new data structure and returned.

The evaluation of conditions uses a helper function, evaluateCondition, which checks the value of the specified column against the operator and rule value, executing the appropriate comparison logic.

AI Integrations

Currently, dataClassify does not include any direct AI integrations. It relies solely on the specified rules and conditions provided by the user to classify data based on set logic.

Billing Impacts

Utilizing this node may incur costs based on the volume of data processed and the complexity of the configured rules. The more complex and numerous the rules, combined with large datasets, may increase processing time and resources, potentially leading to higher billing if the platform has a usage-based pricing model.

Use Cases & Examples

Use Case 1: Sales Categorization

A company wants to categorize its customer sales data into segments: "Enterprise", "Mid-Market", and "Other". By applying the dataClassify, sales representatives can better target their marketing efforts.

Use Case 2: Customer Segmentation

A marketing department uses the logic to classify customers based on their engagement levels with various products, allowing tailored advertising for different demographics.

Example Configuration

Use Case: A company wants to segment its customers based on their revenue.

Configuration Data:

json

{
  "outputColumn": "customerSegment",
  "defaultLabel": "Low Engagement",
  "rules": [
    { "column": "revenue", "operator": "greaterThan", "value": "10000", "label": "High Value" },
    { "column": "revenue", "operator": "greaterThan", "value": "1000", "label": "Medium Value" },
    { "column": "revenue", "operator": "greaterThan", "value": "100", "label": "Low Value" }
  ]
}

This configuration segments customers based on their revenues, assigning them to three different engagement levels, with any revenue below 100 being categorized as "Low Engagement".

← PreviousData Validation Next →Pipeline