5 min readUpdated Mar 2, 2026

urlParser Documentation

Overview

The urlParser is a specialized component within Vantage that parses URLs from a user-defined data column. It decomposes URL strings into various components, including protocol, hostname, pathname, search, hash, port, origin, query parameters, and a validity flag. This functionality is essential for data analysis tasks that require the extraction and manipulation of URL components.

Purpose

The primary purpose of the urlParser is to transform URL data into a structured format that can be easily analyzed and utilized in further data-processing tasks. It validates the URLs and extracts critical parts for analytical purposes, ensuring that users can work with clean, well-structured data.

Settings

The urlParser has a flexible configuration that allows users to tailor its behavior according to their specific needs. Below is a detailed breakdown of each setting.

1. urlColumn

2. manualUrls

How It Works

  1. Input Handling: The function first checks if the input data is provided in the correct format. If the data is nested, it attempts to extract the actual array of URLs.

  2. Manual URL Fallback: If no valid URL data is available, it checks the manualUrls field. If URLs are specified here, they are split by lines, trimmed of whitespace, and formatted into objects containing the URL under the specified urlColumn.

  3. URL Parsing: The function iterates over the array of URLs:

    • It checks if each URL is valid and whether it has the correct format (i.e., is a string).
    • It uses the URL constructor to decompose the URL into its components (protocol, hostname, pathname, etc.).
    • It adds a url_valid flag indicating whether the provided URL is valid.
    • If parsing fails, it sets default empty values for the components.
  4. Output: The function returns an array of objects, each enriched with parsed URL components or an indicator of validity.

Data Expectations

Input Data

Output Data

Use Cases & Examples

Use Case 1: Web Analytics

A marketing team needs to analyze traffic coming from various URLs to gauge the effectiveness of different campaigns. By using urlParser, they can extract campaign-related parameters from URLs stored in their database.

Use Case 2: Data Validation

A data engineer is integrating multiple data sources that contain URLs. They use urlParser to verify URL validity and to generate metrics on which sources contain errors or malformed URLs.

Example Configuration

Scenario

A data analyst has a dataset with raw URLs in a column named "website" and needs to parse and analyze them for further reporting.

Configuration Data

json
{
  "urlColumn": "website",
  "manualUrls": ""
}

In this configuration:

Output

After running the urlParser with this configuration on a dataset, the output would consist of enriched records containing valid/invalid status and structured components for each URL in the specified column.

Such structured data allows the analyst to easily generate reports and insights based on URL patterns and their associated attributes.

AI Integrations and Billing Impacts

AI Integrations

Currently, urlParser does not have direct AI integrations within its functionality. However, its structured output can be utilized in AI models for predictive analytics, enhancing user insights from parsed URLs.

Billing Impacts

Using the urlParser does not directly incur additional billing costs unless integrated within workflows that exceed plan limits for data processing or outputs. Users should monitor usage based on their Vantage subscription levels, especially if processing large datasets.

This comprehensive documentation provides users with all the necessary details to effectively utilize the urlParser component in their data processing workflows within the Vantage platform.