FillCleanNodeEditor Documentation
Overview
The FillCleanNodeEditor is an integral component of the Vantage analytics platform, designed for data manipulation and cleaning workflows. This custom editor allows users to define specific rules to modify their datasets by handling null values, cleaning whitespace, and converting data types effectively.
Purpose
The primary purpose of the FillCleanNodeEditor is to streamline the data cleaning process by enabling users to construct a sequence of rules. Each rule specifies a column to modify, an action to perform on that column, and (optionally) a default value to fill when applicable. This editor is focused on enhancing the usability and efficiency of data preparation tasks within the platform.
Settings
The FillCleanNodeEditor comprises several configurable settings that dictate its behavior and output. Below is a detailed explanation of each setting:
1. Rules Configuration
- Name: Rules
- Input Type: Array of Objects
- Description: The
rulessetting is an array where each rule consists of the following properties:column: Specifies which column to apply the action to (can be set to__all__to apply to all columns).action: Defines the operation to be performed on the designated column (e.g., fill, trim, convert type).value: An optional property that provides a specific value used with certain actions (like filling a default value).
- Default Value:
[](an empty array, meaning no cleaning rules are present initially).
Rule Properties
Each rule can be defined with specific properties:
1.1 Column
- Input Type: String (or a special string
__all__) - Description: The column to which the action will be applied. Users can either select a specific column or apply the action to all columns if
__all__is chosen. - Default Value:
__all__
1.2 Action
- Input Type: Dropdown (select)
- Description: Users can choose among various actions. Available actions include:
fillDefault(Fill with a default value)fillForward(Forward fill)fillBackward(Backward fill)dropNulls(Drop null rows)trimWhitespace(Trim whitespace)toLowerCase(Convert to lowercase)toUpperCase(Convert to UPPERCASE)toNumber(Convert to number)toDate(Parse as date)
- Default Value:
trimWhitespace
1.3 Value
- Input Type: String
- Description: An optional input field that provides a specific value which is applicable only when the selected action is
fillDefault. Changing this value alters the default behavior of the FillCleanNodeEditor to replace null or empty entries with the provided input. - Default Value:
''(an empty string)
Quick Actions
These are predefined actions that can be quickly executed, especially when no rules are currently set. Options include trimming whitespace, dropping null rows, or filling blanks with a default value.
How It Works
The FillCleanNodeEditor operates in the following manner:
- Initialization: Upon launching, the editor fetches upstream columns based on the selected node and edges (data relationships).
- Rule Definition: Users can define multiple rules, and each rule is composed of a column, an action, and (when relevant) a value. This customization allows users to create a tailored data-preparation pipeline.
- Updating Rules: Users can dynamically add, update, remove, or reorder rules as needed. This real-time flexibility ensures that users can experiment with different data cleaning strategies.
- Execution Order: The processing of rules occurs top-to-bottom; therefore, the sequence in which rules are applied can significantly influence the outcome.
Data Expectations
The FillCleanNodeEditor expects data formatted in a way that includes columns that the rules will manipulate. It works with upstream columns detected dynamically from the data preview result. The actions applied via the rules should be compatible with the data types present in these columns.
Use Cases & Examples
Use Cases
-
Cleaning a Customer Data Set: A business might receive customer data that contains null or incorrectly formatted entries. They can use the FillCleanNodeEditor to fill missing values, trim whitespace in customer names, and standardize the format of phone numbers.
-
Preparing Sales Data for Analysis: Sales data may have empty entries that need to be filled for accurate reporting. The FillCleanNodeEditor can be leveraged to fill defaults for missing sales figures, apply consistent formatting, and drop unnecessary null rows.
-
Transforming Employee Records: Employee records may require rigorous cleaning to ensure consistency across data fields such as names and identification numbers based on formatting standards.
Configuration Example for Cleaning a Customer Data Set
To clean customer data effectively, a user might configure the FillCleanNodeEditor with the following rules:
{
"rules": [
{
"column": "customer_name",
"action": "trimWhitespace",
"value": ""
},
{
"column": "customer_email",
"action": "toLowerCase",
"value": ""
},
{
"column": "purchase_amount",
"action": "fillDefault",
"value": "0"
}
]
}In this configuration:
- The
customer_namecolumn will have all leading/trailing whitespace removed. - The
customer_emailentries will be converted to lowercase to ensure consistency in how emails are handled. - Any
purchase_amountentries that are null will be filled with a default value of0, allowing for accurate financial reporting.
This setup ensures that the dataset is clean, uniform, and ready for further analysis or reporting tasks.