AggregationNodeEditor Documentation
Overview
The AggregationNodeEditor is a dedicated editor for managing the aggregation workflow within the Vantage analytics platform. It offers an intuitive interface for configuring data aggregation operations based on the defined columns of upstream data. Users can specify grouping criteria and select various aggregation functions to transform their datasets, leading to enhanced analytical insights.
Purpose
The primary purpose of the AggregationNodeEditor is to facilitate the configuration of aggregation operations directly within the data workflow. This component allows users to group data, specify different aggregation functions (like sum, average), and set pivot configurations, making data manipulation and analysis more streamlined and accessible.
Settings
1. Group By
- Name:
groupBy - Input Type: Array (String or Object)
- Description: This setting holds the columns by which the data will be grouped. Each entry can either be a string (column name) or an object that includes a column name and a date period for time-based grouping. Changing the
groupBysetting alters the granularity of the data analysis, leading to potentially different aggregated results. - Default Value:
[](empty array)
2. Aggregations
- Name:
aggregations - Input Type: Array of Objects
- Description: This array contains objects that define how to aggregate the data, specifying which aggregation function to apply to which column, as well as an optional alias for the result. Each aggregation can be configured separately.
- Default Value:
[](empty array)
3. Pivot By
- Name:
pivotBy - Input Type: Object
- Description: This setting allows users to pivot the aggregated results by a specified column and associated time period. It can adjust the layout of the resulting dataset by transforming unique values through a pivot operation.
- Default Value:
null
Aggregation Functions
The editor supports various aggregation functions defined in the AGG_FUNCTIONS array:
| Function | Needs Column | Numeric | Description |
|---|---|---|---|
| Count | No | No | Returns the number of rows |
| Count Distinct | Yes | No | Returns the count of unique values |
| Sum | Yes | Yes | Returns the total of numeric values |
| Average | Yes | Yes | Returns the mean of numeric values |
| Min | Yes | Yes | Returns the smallest value |
| Max | Yes | Yes | Returns the largest value |
| Median | Yes | Yes | Returns the median value |
| First | Yes | No | Returns the first value |
| Last | Yes | No | Returns the last value |
Date Periods
The DATE_PERIODS array allows users to specify how dates should be grouped:
| Value | Label |
|---|---|
| '' | Exact |
| 'day' | Day |
| 'week' | Week |
| 'month' | Month |
| 'year' | Year |
How It Works
The AggregationNodeEditor leverages state management hooks to keep track of the selected node's configuration, using upstream column detection to suggest available columns for grouping and aggregation. When a user modifies the settings, the changes are handled through updateField, which updates the main workflow state with new configurations.
The editor continuously recalculates the summary text to provide contextual information about the current configuration, combining grouped columns, selected aggregations, and pivot configurations into a cohesive summary.
Expected Data
The AggregationNodeEditor expects an object encapsulating:
groupBy: an array for grouping column names or objectsaggregations: detailed specifications of what aggregations are applied to which columnspivotBy: specifies the pivot configuration for generating pivot tables in the results- Additionally, it expects corresponding upstream data which is analyzed to detect numeric, categorical, or date columns.
Integrations & Billing Impact
- AI Integration: The
AggregationNodeEditordoes not include specific AI features but could be utilized in conjunction with machine learning workflows to analyze data patterns post-aggregation. Future updates may incorporate predictive analytics based on aggregated results. - Billing Impact: The use of aggregation may incur data processing costs depending on the amount of data processed and the complexity of the aggregations defined. Users should monitor their data sizes and types to anticipate potential impacts on billing.
Use Cases & Examples
Use Cases
- Sales Performance Analysis: A retail company wants to understand the monthly sales performance grouped by different product categories.
- Customer Segmentation: A marketing team needs to segment customer data based on demographics while aggregating purchase frequencies to refine targeting strategies.
- Financial Reporting: A finance department requires quarterly revenue and expenditure reports aggregated by departments within the organization.
Example Configuration
Use Case: Sales Performance Analysis
To track monthly sales across various product categories, the user may configure the AggregationNodeEditor as follows:
{
"groupBy": [
{ "column": "product_category", "datePeriod": "month" }
],
"aggregations": [
{ "column": "sales_amount", "function": "sum", "alias": "total_sales" },
{ "column": "transactions", "function": "count", "alias": "total_transactions" }
],
"pivotBy": null
}With this configuration:
- Data is aggregated by the
product_categoryeach month. - The total sales for each category are calculated using the
sumfunction, and the total number of transactions is also tracked through thecountfunction. - This enables business analysts to evaluate performance on a monthly basis effectively, resulting in clearer strategic planning.