Queryrunner Documentation
Overview
The Queryrunner is a critical component of the Vantage analytics and data platform designed for executing data queries and returning processed results. It acts as the intermediary between raw datasets and analytics logic, allowing users to specify a series of transformation steps that manipulate the data according to their needs. The Queryrunner supports optional caching to enhance performance for repeated queries.
Purpose
The primary purpose of the Queryrunner is to validate input data and configuration, execute specified query steps through a pipeline, and return processed results. It ensures robustness by providing error handling and supports caching to improve execution time for identical queries.
Data Expectations
The Queryrunner expects the following inputs:
- Dataset: An array of data objects that the query will act upon.
- QueryConfig: An object that includes a
stepsarray defining the operations to be performed on the dataset during the query execution.
All inputs must conform to the specified structures to avoid runtime errors.
Settings
The Queryrunner has several configurable settings that can be adjusted to refine its behavior. Below, each setting is described in detail.
1. Dataset
- Input Type: Array
- Description: The dataset setting is the primary input that the Queryrunner operates on. It represents the collection of data that will undergo querying.
- Effect of Changes: Changing the dataset directly affects the results returned by the Queryrunner. A different dataset will yield different outputs depending on the query steps defined.
- Default Value: No default; it is a required parameter.
2. QueryConfig
- Input Type: Object
- Description: The queryConfig setting determines how the dataset is transformed. It includes a
stepsproperty that is an array of transformation steps that dictate what actions to perform on the dataset. - Effect of Changes: Modifying the steps (adding, removing, or altering them) will change the manner in which the dataset is processed and thus affect the results returned.
- Default Value: No default; it is a required parameter. Must include a
stepsarray.
3. CacheKey
- Input Type: String (optional)
- Description: This setting is used to create a unique key for the cache. If present, the Queryrunner will attempt to store and retrieve results based on this key.
- Effect of Changes: Changing the cache key will dictate whether the results of a query can be retrieved from the cache. A unique key leads to separate cache entries.
- Default Value:
undefined(caching is disabled if not specified).
4. CacheTags
- Input Type: Array (optional)
- Description: Used to apply tags for cache management. This allows the results sourced from the cache to be categorized.
- Effect of Changes: Adding or modifying cache tags will influence how cached results are organized and invalidated, impacting cache retrieval efficiency.
- Default Value:
['vantage:query']if not provided; it helps classify cached results under a default tag.
How it Works
The Queryrunner operates by following these steps:
- Input Validation: The function first checks that the dataset and queryConfig are provided and valid. Errors are thrown if they are missing.
- Node Creation: It builds nodes for execution from the defined query steps using the helper function
buildNodes. - Pipeline Execution: It runs the transformation pipeline on the dataset using the
runPipelinemethod with the nodes created. - Caching: If a cache key is provided, it attempts to store the results in cache for subsequent requests.
- Error Handling: Any errors encountered during execution are caught, and detailed error messages are returned.
Use Cases & Examples
Use Case 1: Data Transformation for Reporting
Business Scenario: A company wants to analyze sales data to generate monthly revenue reports. They need to aggregate sales by month and calculate averages to report on their performance.
Use Case 2: Data Cleansing
Technical Scenario: A data analyst has a dataset with inconsistent encoding and missing values. They need to standardize text formatting and fill in the gaps before analysis.
Example Configuration for Data Cleansing
Scenario Solution: Using Queryrunner to cleanse a dataset to prepare it for further analysis.
Sample Configuration Data
{
"dataset": [
{"id": 1, "name": "Alice", "amount": 100},
{"id": 2, "name": null, "amount": 150},
{"id": 3, "name": "BOB", "amount": 120}
],
"queryConfig": {
"steps": [
{
"type": "replaceMissing",
"params": {"field": "name", "default": "Unknown"}
},
{
"type": "textToLower",
"params": {"field": "name"}
}
]
},
"cacheKey": "cleanse_sales_data_march2023",
"cacheTags": ["sales_report"]
}Explanation: In the configuration above, the dataset contains sales records where some names are missing. The queryConfig includes steps to replace missing names with "Unknown" and convert all names to lowercase. The caching helps improve performance for repeated requests for this specific cleansing process.
Final Thoughts
The Queryrunner is designed to be a versatile and powerful tool within the Vantage platform, allowing users to configure and execute complex queries efficiently while ensuring data integrity and performance. Adjusting its settings provides flexibility to accommodate various data manipulation needs.