5 min readUpdated Mar 2, 2026

URL Reader Documentation

Overview

The urlReader is a component within the Vantage analytics platform designed to fetch content from specified URLs. It extracts crucial data, including the page title, meta description, plain text content, links, status code, and content type. This enables users to gather structured information from web pages efficiently.

Purpose

The primary function of the urlReader is to automate the retrieval and parsing of web page content for various use cases in data analytics and reporting. It's particularly useful for tasks like web scraping, content analysis, SEO optimization, and more.

Settings

1. urlColumn

2. manualUrls

3. maxUrls

4. timeoutMs

5. extractLinks

6. extractMeta

7. userAgent

How It Works

When executed, the urlReader follows these steps:

  1. Input Handling: It takes inputs defined by the users via a column indicated by urlColumn or manual URLs supplied directly via manualUrls.
  2. Data Preparation: If no input data is available and manualUrls is defined, the component uses that data. If neither is available, it outputs an empty dataset.
  3. URL Fetching: It processes the URLs (up to maxUrls) by fetching each page and applying a timeout defined by timeoutMs.
  4. Data Extraction: It extracts the page title, meta description, plain text content, links, and status code using internal helper functions. The extracted data is then structured into rows of output data.
  5. Error Handling: If a request fails due to a timeout or another error, it captures the error message and continues processing remaining URLs.
  6. Output Structure: The component returns an array of objects containing the extracted data for each URL.

Expected Data

The urlReader expects data in a structured format, which could be:

The output will include:

Use Cases & Examples

Use Case 1: SEO Analysis

A digital marketing agency needs to evaluate the SEO performance of multiple competitors' web pages. Using the urlReader, they can extract essential elements such as page titles and meta descriptions for analysis.

Use Case 2: Content Aggregation

A content aggregator platform wants to fetch and summarize news articles from various news sites. The urlReader can be configured to pull data and generate summaries using the plain text content.

Example Configuration for SEO Analysis Use Case

To configure the urlReader for SEO analysis, the following settings might be used:

json
{
  "urlColumn": "page_url",
  "manualUrls": "https://example.com/article1\nhttps://example.com/article2",
  "maxUrls": 20,
  "timeoutMs": 8000,
  "extractLinks": false,
  "extractMeta": true,
  "userAgent": "SEOAnalyzer/1.0"
}

In this example:

This configuration will efficiently gather SEO data from the specified URLs, allowing the agency to analyze page elements quickly.