5 min read

Queryrunner Documentation

Overview

The Queryrunner is a critical component of the Vantage analytics and data platform designed for executing data queries and returning processed results. It acts as the intermediary between raw datasets and analytics logic, allowing users to specify a series of transformation steps that manipulate the data according to their needs. The Queryrunner supports optional caching to enhance performance for repeated queries.

Purpose

The primary purpose of the Queryrunner is to validate input data and configuration, execute specified query steps through a pipeline, and return processed results. It ensures robustness by providing error handling and supports caching to improve execution time for identical queries.

Data Expectations

The Queryrunner expects the following inputs:

Dataset: An array of data objects that the query will act upon.
QueryConfig: An object that includes a steps array defining the operations to be performed on the dataset during the query execution.

All inputs must conform to the specified structures to avoid runtime errors.

Settings

The Queryrunner has several configurable settings that can be adjusted to refine its behavior. Below, each setting is described in detail.

1. Dataset

Input Type: Array
Description: The dataset setting is the primary input that the Queryrunner operates on. It represents the collection of data that will undergo querying.
Effect of Changes: Changing the dataset directly affects the results returned by the Queryrunner. A different dataset will yield different outputs depending on the query steps defined.
Default Value: No default; it is a required parameter.

2. QueryConfig

Input Type: Object
Description: The queryConfig setting determines how the dataset is transformed. It includes a steps property that is an array of transformation steps that dictate what actions to perform on the dataset.
Effect of Changes: Modifying the steps (adding, removing, or altering them) will change the manner in which the dataset is processed and thus affect the results returned.
Default Value: No default; it is a required parameter. Must include a steps array.

3. CacheKey

Input Type: String (optional)
Description: This setting is used to create a unique key for the cache. If present, the Queryrunner will attempt to store and retrieve results based on this key.
Effect of Changes: Changing the cache key will dictate whether the results of a query can be retrieved from the cache. A unique key leads to separate cache entries.
Default Value: undefined (caching is disabled if not specified).

4. CacheTags

Input Type: Array (optional)
Description: Used to apply tags for cache management. This allows the results sourced from the cache to be categorized.
Effect of Changes: Adding or modifying cache tags will influence how cached results are organized and invalidated, impacting cache retrieval efficiency.
Default Value: ['vantage:query'] if not provided; it helps classify cached results under a default tag.

How it Works

The Queryrunner operates by following these steps:

Input Validation: The function first checks that the dataset and queryConfig are provided and valid. Errors are thrown if they are missing.
Node Creation: It builds nodes for execution from the defined query steps using the helper function buildNodes.
Pipeline Execution: It runs the transformation pipeline on the dataset using the runPipeline method with the nodes created.
Caching: If a cache key is provided, it attempts to store the results in cache for subsequent requests.
Error Handling: Any errors encountered during execution are caught, and detailed error messages are returned.

Use Cases & Examples

Use Case 1: Data Transformation for Reporting

Business Scenario: A company wants to analyze sales data to generate monthly revenue reports. They need to aggregate sales by month and calculate averages to report on their performance.

Use Case 2: Data Cleansing

Technical Scenario: A data analyst has a dataset with inconsistent encoding and missing values. They need to standardize text formatting and fill in the gaps before analysis.

Example Configuration for Data Cleansing

Scenario Solution: Using Queryrunner to cleanse a dataset to prepare it for further analysis.

Sample Configuration Data

json

{
  "dataset": [
    {"id": 1, "name": "Alice", "amount": 100},
    {"id": 2, "name": null, "amount": 150},
    {"id": 3, "name": "BOB", "amount": 120}
  ],
  "queryConfig": {
    "steps": [
      {
        "type": "replaceMissing",
        "params": {"field": "name", "default": "Unknown"}
      },
      {
        "type": "textToLower",
        "params": {"field": "name"}
      }
    ]
  },
  "cacheKey": "cleanse_sales_data_march2023",
  "cacheTags": ["sales_report"]
}

Explanation: In the configuration above, the dataset contains sales records where some names are missing. The queryConfig includes steps to replace missing names with "Unknown" and convert all names to lowercase. The caching helps improve performance for repeated requests for this specific cleansing process.

Final Thoughts

The Queryrunner is designed to be a versatile and powerful tool within the Vantage platform, allowing users to configure and execute complex queries efficiently while ensuring data integrity and performance. Adjusting its settings provides flexibility to accommodate various data manipulation needs.

← PreviousPipeline Next →JSON Extract