Projection Logic Documentation
Overview
The "projection" logic in the Vantage analytics and data platform is designed to transform data sets by projecting specific fields from input data while also allowing for the renaming of these fields. This functionality is critical for data shaping, enabling users to focus on relevant data points and customize output formats as needed.
Settings
The projection logic comprises a configuration object that includes several settings, each of which allows for specific customization of the projection operation. Below is a detailed breakdown of each setting:
1. fields
- Input Type: Array of strings
- Description: The
fieldssetting specifies which fields from the incoming data should be included in the output. If this array is empty, all fields will be projected. Changes to this setting directly affect which fields are retained in the output dataset. - Default Value:
[](empty array)
2. rename
- Input Type: Object (key-value pairs)
- Description: The
renamesetting allows users to define a mapping of original field names to new field names. If a field name specified in thefieldsarray has a corresponding entry in this object, the output will use the new name instead of the original. This setting provides flexibility in terms of understanding and usability of the output data. - Default Value:
{}(empty object)
How It Works
The createProjectionNode function creates a projection node that is structured to process incoming data arrays. Here’s an overview of its operation:
- Validation: The function first checks if the input data is an array. If it is not, it returns an empty array.
- Field Mapping: If the
fieldsarray is empty, it returns the original data unaltered. If there are fields specified, it constructs a new array by mapping over each row of the input data. - Field Extraction & Renaming: For each specified field in
fields, it checks if there is a corresponding entry in therenameobject to determine if the field should be renamed. It builds and returns an output object with the selected fields and their values according to the specified rules.
The execute method encapsulates the core logic for transforming the input data according to the settings defined during node creation.
Expected Data
- Input Data Type: An array of objects, where each object represents a row of data containing key-value pairs.
- Field Names: The keys within the objects must match those specified in the
fieldsarray to properly extract values.
AI Integrations
While the current implementation of the projection logic does not directly integrate with AI features, it can serve as a preprocessing step for AI models by curating datasets that are explicitly relevant for machine learning training or inference tasks. By streamlining datasets through projection and renaming, it can help enhance the input quality for AI algorithms.
Billing Impacts
The projection logic itself does not incur additional costs directly; however, the overall processing of data, including projections, contributes to the level of resource utilization on the Vantage platform. Users should be aware that extensive data transformations may result in increased usage metrics, potentially leading to higher billing based on their subscription plan.
Use Cases & Examples
Use Cases
-
Data Standardization: A marketing analytics team needs to prepare a clean dataset from various sources for analysis. They can use the projection logic to extract relevant fields and rename them for consistency across reports.
-
Dashboard Creation: A data visualization team needs to create an executive dashboard that only showcases specific KPIs. The projection component can help by allowing them to select only these KPIs and provide clear labels.
-
Data Pipeline Preparation: A data engineering team is collecting data from diverse sources and needs to preprocess the data for further transformations. The projection can streamline necessary fields and improve the efficacy of processing steps downstream.
Example Configuration
Use Case: Data Standardization for Marketing Reports
Suppose a marketing analytics team needs to prepare a dataset that includes only the email and purchase amount fields from an input dataset, while renaming "purchaseAmount" to "totalSpent" for clarity.
Sample Configuration:
const projectionNode = createProjectionNode({
fields: ['email', 'purchaseAmount'],
rename: {
purchaseAmount: 'totalSpent',
},
});When executed with the following input data:
const inputData = [
{ email: 'user1@example.com', purchaseAmount: 100, otherField: 'abc' },
{ email: 'user2@example.com', purchaseAmount: 150, otherField: 'def' },
];The output will be:
[
{ email: 'user1@example.com', totalSpent: 100 },
{ email: 'user2@example.com', totalSpent: 150 },
]This output format is immediately ready for analysis or reporting, showcasing only the relevant fields with clear naming.