aiUrlReader Documentation
Purpose
The aiUrlReader logic is designed to fetch web page content from specified URLs and send that content to a user's configured AI integration for various types of analyses. This includes summarization, extraction of key facts, question answering (Q&A), and performing custom analyses based on user-defined prompts.
Settings
The aiUrlReader logic has several configurable settings that dictate its behavior:
-
urlColumn
- Input Type: String
- Description: This setting defines the name of the column that contains URLs in the input data. The logic will fetch pages from these URLs for processing.
- Default Value:
"url" - Effect of Change: Modifying this setting allows the user to specify a different column name from which the logic will read the URLs, thereby allowing it to work with different data formats.
-
manualUrls
- Input Type: String
- Description: This setting allows users to manually specify a list of URLs that can be processed when no upstream data is available. URLs should be newline-separated.
- Default Value:
""(empty string) - Effect of Change: Entering manual URLs here allows the logic to still function even if no URLs are provided in the main input. If the upstream data is absent, it falls back on this input.
-
analysisType
- Input Type: Dropdown (choices: "summarize", "extract", "qa", "custom")
- Description: This setting determines the type of analysis the AI will perform on the fetched web content.
- Default Value:
"summarize" - Effect of Change: Switching this option alters the AI prompt and the nature of the analysis performed. For instance, selecting "extract" will generate a structured list of key facts instead of a summary.
-
customPrompt
- Input Type: String
- Description: A user-defined prompt that can be used when the
analysisTypeis set to "qa" or "custom". This allows for personalized instructions to the AI. - Default Value:
""(empty string) - Effect of Change: If defined, the
customPromptwill be used in the conversation with the AI, enabling a flexible question-and-answer format or a tailored analytical approach.
-
maxUrls
- Input Type: Numeric
- Description: This setting specifies the maximum number of URLs to process in a single execution of the logic.
- Default Value:
10 - Effect of Change: Increasing this number allows for processing of more URLs simultaneously, while decreasing it will limit the number of processed URLs, conserving resources.
-
maxContentLength
- Input Type: Numeric
- Description: This setting controls the maximum length of the text content fetched from any single page. Any content that exceeds this character limit is truncated.
- Default Value:
15000 - Effect of Change: Adjusting this value will impact how much content is sent to the AI integration, with higher values allowing for more comprehensive content analysis but potentially using more resources.
-
timeoutMs
- Input Type: Numeric
- Description: Defines the maximum time, in milliseconds, allowed for fetching the content of a URL before the request is aborted.
- Default Value:
10000(10 seconds) - Effect of Change: Setting a lower timeout ensures quicker failure on slow requests, while a higher value may allow time for fetching content from slower sites, impacting overall execution time.
-
outputColumn
- Input Type: String
- Description: This setting specifies the name of the output column where the AI analysis results will be stored.
- Default Value:
"ai_analysis" - Effect of Change: Changing this name will alter where the output data can be accessed in the resulting data structure, making it crucial to match with downstream usage.
How it Works
The aiUrlReader operates in the following steps:
-
Input Handling: It accepts input data, looking for URLs in the specified
urlColumn. If no URLs are found in the input, it will default to usingmanualUrlsif provided. -
Integration Retrieval: It retrieves the preferred AI integration for conducting the analysis, fetching necessary credentials dynamically based on the user's session.
-
Content Fetching: For each URL, the logic attempts to fetch the web page's HTML content. It enforces a timeout and handles possible fetch errors gracefully.
-
Content Processing: After fetching, the content is stripped of HTML tags and the title of the page is extracted. The processed content is then used to create a specific prompt for the AI.
-
AI Analysis Execution: The constructed prompt is sent to the AI integration, which provides the analysis results based on the configured
analysisType. -
Result Compilation: Finally, the results, including page titles and analysis outputs, are compiled into a structured response.
Use Cases & Examples
Use Case 1: Summarizing News Articles
A media company might use the aiUrlReader to automatically generate concise summaries of the day's news articles. By feeding URLs from a news aggregator into this logic, they can streamline the content curation process.
Use Case 2: Extracting Product Data
An e-commerce platform may utilize aiUrlReader to scrape product information from competitor websites. The information can include product names, prices, and reviews for comparative analysis.
Use Case 3: Interactive Q&A for Educational Content
An educational institution could configure aiUrlReader to provide a question-answer setup where students can input URLs of educational resources, and then ask specific questions related to the content on those pages.
Configuration Example for Use Case 1: Summarizing News Articles
{
"urlColumn": "article_url",
"manualUrls": "",
"analysisType": "summarize",
"customPrompt": "",
"maxUrls": 5,
"maxContentLength": 10000,
"timeoutMs": 5000,
"outputColumn": "summary_output"
}In this example:
- urlColumn is set to
"article_url", which corresponds to the relevant input data field containing the article URLs. - maxUrls is configured to
5, limiting the number of articles processed in each execution. - maxContentLength is limited to
10000to focus on shorter articles, allowing for prompt responses. - The AI analyses will provide summaries stored under
"summary_output".
This configuration allows the media company to effectively summarize the content of 5 articles retrieved from their input data.