5 min read

textExtraction Documentation

Overview

The textExtraction component of the Vantage analytics and data platform is designed to utilize AI for extracting structured data from unstructured text columns. This powerful tool supports various extraction types including named entities, contact information, financial data, key-value pairs, and custom extraction prompts. By converting unstructured text into meaningful structured data, organizations can enhance their data processing capabilities and drive more informed decision-making.

Purpose

The primary purpose of this node is to transform unstructured text data into structured formats, allowing users to effectively analyze and utilize valuable information hidden within text. This function is particularly useful for tasks that involve processing large volumes of text data, such as contracts, emails, or customer feedback.

Settings

The textExtraction component features the following configurable settings:

sourceColumn
- Input Type: String
- Description: Specifies the name of the column containing the unstructured text from which data will be extracted. By changing this setting, users can dictate which text field to analyze, allowing flexibility in data sources.
- Default Value: "text"
extractionType
- Input Type: Dropdown
- Description: Determines the type of extraction to be performed. The options include:
  - entities: Extract named entities.
  - contacts: Extract contact information.
  - financial: Extract financial data.
  - key_value: Extract key-value pairs.
  - custom: Use a user-defined prompt for extraction. Modifying this setting will change the AI model's behavior, thereby altering the structure and focus of the extracted data.
- Default Value: "entities"
outputColumn
- Input Type: String
- Description: Defines the column name where the extracted data will be stored. Changing this can help users manage output organization, especially useful when handling multiple extraction results.
- Default Value: "extracted"
customPrompt
- Input Type: String
- Description: Allows users to define a custom prompt for the AI extraction process. This enables more tailored data extraction based on specific context or requirements. Leaving this field empty will revert to the default behavior defined by the extractionType.
- Default Value: "" (empty string)
outputFormat
- Input Type: Dropdown
- Description: Specifies the format of the extracted output. The options are:
  - json: Return the output in JSON format (structured).
  - csv: Return the output as comma-separated values (flattened).
  - text: Return the output as plain readable text. Changing this setting affects how extracted data is represented and used in subsequent processes.
- Default Value: "json"
batchSize
- Input Type: Numeric
- Description: Defines the size of data batches to be processed during the extraction. Setting a larger batch size can enhance performance, but may also increase the risk of hitting API limits or timeouts. Conversely, a smaller batch size can improve reliability at the cost of processing speed.
- Default Value: 25

How It Works

This node operates by receiving input data, executing AI-powered extraction based on the designated settings, and returning structured data. Here's a simplified flow of execution:

Data Retrieval: The component retrieves text data from the specified sourceColumn.
Input Validation: It checks whether the input data is valid and in an appropriate format for processing.
Configuration Setup: The settings are applied, determining the type of extraction, output format, and batch processing method.
AI Integration Selection: The function identifies the appropriate AI integration for executing the text extraction based on user context.
Data Processing: It processes the data in batches, sending extraction prompts to the AI model and receiving structured responses.
Output Compilation: Finally, it compiles results, handling errors gracefully and ensuring that all output conforms to the specified schema.

Expected Data

The textExtraction component expects the following type of input data:

Format: An array of objects, where each object represents a row of data.
Structure: Each object must contain a key matching sourceColumn, which holds the unstructured text from which data will be extracted.

Example input data format:

json

[
    {"text": "Contact John Doe at john@example.com from Acme Corp."},
    {"text": "On June 1st, we received a payment of $1,000 for invoice #123."}
]

AI Integrations

This node employs AI for its extraction tasks, integrating with AI service providers to leverage powerful natural language processing capabilities. The integration is determined based on the user's context, ensuring that the best available AI service is used for processing tasks.

Billing Impacts

Using the textExtraction component may incur billing based on the following factors:

AI Calls: Each data extraction task typically counts as an API call to the AI service, which may contribute to usage costs.
Batch Processing Size: Processing larger batches may consume credits more rapidly.
Data Volume: The amount of data processed can affect overall billing, as many platforms charge based on the volume of extracted data.

Users should consult their specific pricing policy to understand how the usage of textExtraction will impact billing.

Use Cases & Examples

Use Cases

Customer Feedback Analysis: Organizations can use textExtraction to analyze customer reviews and feedback by extracting relevant information such as contact details, sentiments, and entities mentioned in the feedback.
Data Processing for Legal Documents: Law firms can automate the extraction of important legal terms and parties from contracts and agreements to streamline workflows and ensure compliance.
Financial Reporting: Businesses can utilize the component to extract relevant financial data from transaction records or invoices, allowing for efficient analysis and reporting.

Example Configuration

Use Case: Automating Customer Feedback Analysis

Configuration:

json

{
    "sourceColumn": "feedback",
    "extractionType": "contacts",
    "outputColumn": "extracted_data",
    "customPrompt": "",
    "outputFormat": "json",
    "batchSize": 20
}

Input Data:

json

[
    {"feedback": "I contacted support through john.doe@example.com and they helped me quickly."},
    {"feedback": "Jane Smith from Acme Corp is always available at janesmith@acmecorp.com."}
]

Expected Output:

json

[
    {"extracted_data": {"names": ["John Doe"], "emails": ["john.doe@example.com"], "phones": [], "addresses": [], "websites": []}},
    {"extracted_data": {"names": ["Jane Smith"], "emails": ["janesmith@acmecorp.com"], "phones": [], "addresses": [], "websites": []}}
]

By implementing this configuration, users can automatically extract key contact information from customer feedback, helping enhance customer service efforts and contact management efficiency.

← PreviousPDF Extract Next →Transcriber

textExtraction Documentation

Overview

Purpose

Settings

How It Works

Expected Data

AI Integrations

Billing Impacts

Use Cases & Examples

Use Cases

Example Configuration

Related Pages