WebNodeEditor Documentation
Purpose
The WebNodeEditor is a component designed to facilitate data extraction and analysis from web URLs. It provides users the ability to configure how URLs are processed and how data is extracted using various settings. Users can specify columns from uploaded data, define manual URLs, and leverage AI capabilities for in-depth web analysis.
Settings
1. URL Column
- Type: Dropdown
- Description: Allows the user to select a column that contains URLs from the upstream data. This setting directly influences the URLs that will be processed by the component.
- Default Value:
url
2. Manual URLs
- Type: Text Area
- Description: Users can input URLs manually, one per line. This is particularly useful when no upstream data containing URLs is available. The component will use these URLs for processing.
- Default Value: Empty
3. Max URLs (for URL Reader and AI URL Reader)
- Type: Numeric Slider
- Description: Specifies the maximum number of URLs that the component shall process in a single operation. This setting affects performance and resource usage.
- Default Value:
20for URL Reader and10for AI URL Reader (can be adjusted from 1 to 100 for readers).
4. Timeout (for URL Reader)
- Type: Numeric Slider
- Description: Sets the maximum time (in milliseconds) the component will wait for a URL response before timing out. This is crucial for managing requests to URLs that may be slow to respond.
- Default Value:
10000(10 seconds)
5. Extract Links (for URL Reader)
- Type: Boolean Checkbox
- Description: When checked, the component will extract all links from the given webpage. This enhances the database with additional URL information derived from the page.
- Default Value:
true
6. Extract Meta Description (for URL Reader)
- Type: Boolean Checkbox
- Description: When enabled, this extracts the meta description of the pages alongside standard data, providing further contextual insights.
- Default Value:
true
7. User Agent (for URL Reader)
- Type: Text Input
- Description: This sets the user agent string that the component will impersonate when making HTTP requests. This can help in avoiding blocks by some servers based on user agent detection.
- Default Value:
VantageBot/1.0
8. Analysis Type (for AI URL Reader)
- Type: Dropdown
- Description: Specifies the type of AI analysis to perform on the web page content. Options include summary, entity extraction, question answering, and custom prompts.
- Default Value:
summarize
9. Custom Prompt (for AI URL Reader)
- Type: Text Area
- Description: Provides a way for users to define a custom prompt for AI analysis. This prompt allows for tailored processing of the content based on user needs when
analysisTypeis set tocustom. - Default Value: Empty
10. Max Content Length (for AI URL Reader)
- Type: Numeric Slider
- Description: Defines the maximum number of characters of the page text that will be sent for AI processing. This is a critical parameter for managing the payload to the AI service.
- Default Value:
15000(can be adjusted from 5000 to 50000).
11. Output Column (for AI URL Reader)
- Type: Text Input
- Description: Names the column where the results of the AI analysis will be stored in the dataset. This allows effective referencing of the results in downstream workflows.
- Default Value:
ai_analysis
Use Cases & Examples
Use Case 1: Web Data Mining
A marketing team wants to analyze multiple competitor websites to gather insights. They can use the WebNodeEditor to configure specific URLs, extract meta descriptions, and compile information in a structured format.
Use Case 2: AI-Enhanced Reporting
A data analyst needs to generate a summary of multiple articles for a quarterly report. By selecting the AI analysis type as summarization and entering the desired URLs, they can automate content extraction and summarization.
Example Configuration for Use Case 1
Objective: To extract meta descriptions and links from competitor websites.
Configuration Sample:
{
"urlColumn": "competitor_url",
"manualUrls": "https://competitor1.com\nhttps://competitor2.com",
"maxUrls": 10,
"timeoutMs": 5000,
"extractLinks": true,
"extractMeta": true,
"userAgent": "VantageBot/1.0"
}Example Configuration for Use Case 2
Objective: To analyze articles for summaries.
Configuration Sample:
{
"urlColumn": "article_url",
"manualUrls": "https://article1.com\nhttps://article2.com",
"analysisType": "summarize",
"maxUrls": 5,
"maxContentLength": 20000,
"outputColumn": "ai_summary"
}AI Integrations
The WebNodeEditor contains the capability to perform AI-driven analysis of web pages through settings that allow the selection of analysis types and custom prompts. This makes it an essential tool for users who want enriched data beyond simple web scraping.
Billing Impacts
Use of the WebNodeEditor, especially the AI integration features, may incur additional expenses based on the volume of data processed and the number of AI computations performed. It is recommended to monitor usage to optimize costs associated with web data processing and AI services.