5 min read

Queryrunner Documentation

Overview

The Queryrunner is a critical component of the Vantage analytics and data platform designed for executing data queries and returning processed results. It acts as the intermediary between raw datasets and analytics logic, allowing users to specify a series of transformation steps that manipulate the data according to their needs. The Queryrunner supports optional caching to enhance performance for repeated queries.

Purpose

The primary purpose of the Queryrunner is to validate input data and configuration, execute specified query steps through a pipeline, and return processed results. It ensures robustness by providing error handling and supports caching to improve execution time for identical queries.

Data Expectations

The Queryrunner expects the following inputs:

  1. Dataset: An array of data objects that the query will act upon.
  2. QueryConfig: An object that includes a steps array defining the operations to be performed on the dataset during the query execution.

All inputs must conform to the specified structures to avoid runtime errors.

Settings

The Queryrunner has several configurable settings that can be adjusted to refine its behavior. Below, each setting is described in detail.

1. Dataset

2. QueryConfig

3. CacheKey

4. CacheTags

How it Works

The Queryrunner operates by following these steps:

  1. Input Validation: The function first checks that the dataset and queryConfig are provided and valid. Errors are thrown if they are missing.
  2. Node Creation: It builds nodes for execution from the defined query steps using the helper function buildNodes.
  3. Pipeline Execution: It runs the transformation pipeline on the dataset using the runPipeline method with the nodes created.
  4. Caching: If a cache key is provided, it attempts to store the results in cache for subsequent requests.
  5. Error Handling: Any errors encountered during execution are caught, and detailed error messages are returned.

Use Cases & Examples

Use Case 1: Data Transformation for Reporting

Business Scenario: A company wants to analyze sales data to generate monthly revenue reports. They need to aggregate sales by month and calculate averages to report on their performance.

Use Case 2: Data Cleansing

Technical Scenario: A data analyst has a dataset with inconsistent encoding and missing values. They need to standardize text formatting and fill in the gaps before analysis.

Example Configuration for Data Cleansing

Scenario Solution: Using Queryrunner to cleanse a dataset to prepare it for further analysis.

Sample Configuration Data

json
{
  "dataset": [
    {"id": 1, "name": "Alice", "amount": 100},
    {"id": 2, "name": null, "amount": 150},
    {"id": 3, "name": "BOB", "amount": 120}
  ],
  "queryConfig": {
    "steps": [
      {
        "type": "replaceMissing",
        "params": {"field": "name", "default": "Unknown"}
      },
      {
        "type": "textToLower",
        "params": {"field": "name"}
      }
    ]
  },
  "cacheKey": "cleanse_sales_data_march2023",
  "cacheTags": ["sales_report"]
}

Explanation: In the configuration above, the dataset contains sales records where some names are missing. The queryConfig includes steps to replace missing names with "Unknown" and convert all names to lowercase. The caching helps improve performance for repeated requests for this specific cleansing process.

Final Thoughts

The Queryrunner is designed to be a versatile and powerful tool within the Vantage platform, allowing users to configure and execute complex queries efficiently while ensuring data integrity and performance. Adjusting its settings provides flexibility to accommodate various data manipulation needs.