executeWorkflow Documentation
Overview
The executeWorkflow logic is a core component of the Vantage analytics and data platform, designed to execute workflows consisting of nodes that can have dependencies on each other. It effectively orchestrates the execution of these nodes in a topological order, allowing the results of one node to feed into subsequent nodes while managing the flow and storage of data.
Purpose
- To manage and execute complex workflows involving multiple nodes with defined dependencies.
- To control data flow between nodes using unique data reference IDs (dataRefs) stored in Redis with a Time To Live (TTL) for automatic cleanup.
- To support resumable execution and preview modes.
How It Works
The function begins by preparing a dependency graph based on the provided nodes and edges, accounting for previously computed node results if resuming from a prior state. Nodes that are ready to execute (i.e., have all required inputs) are processed iteratively. Results are passed through Redis where necessary and are truncated if in preview mode to prevent overflowing data limits. The function also handles various scenarios such as offloading to a worker when execution requirements exceed defined thresholds.
Settings
The executeWorkflow function accepts the following settings:
| Setting Name | Input Type | Description | Default Value |
|---|---|---|---|
nodes | Array | An array of node objects containing the individual configurations and properties of each node in the workflow to be executed. | None |
edges | Array | An array of connections between nodes, defining dependencies and data flow between them. | None |
context | Object | A context object containing execution environment variables, state management flags, and settings such as preview mode. | {} |
terminatorId | String | The ID of a single terminator node, which serves as an endpoint for the workflow execution. | None |
terminatorIds | Array | An array of terminator node IDs to specify multiple endpoints for the workflow. | None |
outputNodeId | String | The ID of the primary output node to be considered when generating final results. | None |
Detailed Explanation of Settings
-
nodes
- Input Type: Array
- Description: This setting accepts an array of objects, where each object represents a node in the workflow. Each node can have properties to specify its operation, dependencies, input configurations, and more. The execution order is defined by the relationships indicated by the edges.
- Default Value: None (required setting)
-
edges
- Input Type: Array
- Description: Defines how nodes are connected. Each edge object specifies a source node and a target node, along with any keys necessary for processing data between these nodes. This allows for the dependency graph's visualization and execution sequencing.
- Default Value: None (required setting)
-
context
- Input Type: Object
- Description: A flexible object that can carry execution flags such as
previewMode,forceSync, andresumeFromState. It determines how the workflow executes within the current context, enabling features like previews, resumptions, or synchronizations. - Default Value:
{}(empty object)
-
terminatorId
- Input Type: String
- Description: Specifies a legacy single-node terminator to signal where the workflow execution should conclude. If both
terminatorIdandterminatorIdsare provided,terminatorIdstakes precedence. - Default Value: None (optional setting)
-
terminatorIds
- Input Type: Array
- Description: Provides an array of IDs for multiple terminator nodes, allowing for versatile workflow endpoints and facilitating diverse final data outputs.
- Default Value: None (optional setting)
-
outputNodeId
- Input Type: String
- Description: Used to designate a specific output node as primary, especially in cases where multiple terminators are provided. It is essential for determining the key results to return from the workflow execution.
- Default Value: None (optional setting)
Use Cases & Examples
Use Cases
-
Data Consolidation: A business may use the
executeWorkflowfunction to consolidate different datasets from various sources, applying transformations and aggregating results to create a single comprehensive report. -
Automated Reporting: This logic can be utilized to generate automated reports on a periodic schedule, where data is extracted, processed, and formatted into visual dashboards without manual intervention.
-
Machine Learning Model Execution: The workflow can execute steps in a machine learning pipeline, chaining data preprocessing, model training, and evaluation nodes to streamline the entire process in a defined sequence.
Configuration Example
Use Case: Automated Reporting
In this scenario, a data analyst wants to create an automated reporting workflow that aggregates sales data from multiple sources, applies necessary transformations, and generates a summary report.
{
"nodes": [
{
"id": "extractSalesData",
"node_type": "data/extraction",
"config": {
"source": "sales_db"
}
},
{
"id": "transformData",
"node_type": "data/transformation",
"config": {
"operations": ["groupBy: department", "sum: sales"]
}
},
{
"id": "generateReport",
"node_type": "report/generator",
"config": {
"format": "pdf",
"destination": "s3://reports/sales_summary"
}
}
],
"edges": [
{
"source": "extractSalesData",
"target": "transformData",
"source_key": "data",
"target_key": "input_data"
},
{
"source": "transformData",
"target": "generateReport",
"source_key": "transformed_data",
"target_key": "report_data"
}
],
"context": {
"previewMode": false,
"resumeFromState": {},
"forceSync": false
},
"terminatorId": "generateReport",
"terminatorIds": ["generateReport"],
"outputNodeId": "generateReport"
}In this example:
- The workflow starts by extracting sales data from a database.
- It then transforms this data to calculate total sales by department.
- Finally, it generates a summary report in PDF format and saves it to an AWS S3 bucket.
This structured approach ensures that results are accurate and comprehensive while minimizing manual data handling.