4 min readUpdated Mar 2, 2026

TextExtractionNodeEditor Documentation

Purpose

The TextExtractionNodeEditor is a component designed for extracting structured data from unstructured text. Utilizing AI technologies, it processes inputs such as names, emails, dates, and financial data to produce structured outputs that can be used in analytics and other data operations. This editor provides an intuitive interface for users to configure various settings that dictate how the text extraction is performed, what kind of data is targeted, and the format of the output.

Data Expectations

The TextExtractionNodeEditor expects a configuration object containing the following fields:

Settings

Source Column

  1. Setting Name: sourceColumn
  2. Input Type: Dropdown
  3. Description: Determines which column from the upstream data will be analyzed for text extraction. Changing this setting alters the source text for extraction.
  4. Default Value: text

Extraction Type

  1. Setting Name: extractionType
  2. Input Type: Dropdown
  3. Description: Selects the type of data to extract, such as named entities, contact information, financial data, key-value pairs, or allows for a custom prompt. Changing the extraction type modifies the extraction logic used and impacts the depth and nature of the data extracted.
  4. Default Value: entities

Custom Prompt

  1. Setting Name: customPrompt
  2. Input Type: Textarea
  3. Description: Permits users to enter a custom instruction for data extraction. Users can include a placeholder {{text}} to specify where the source text should be inserted. It is crucial when predefined extraction types are insufficient.
  4. Default Value: An empty string ''

Output Format

  1. Setting Name: outputFormat
  2. Input Type: Button Group (Select)
  3. Description: Determines the format in which the extracted data will be outputted. Choices include JSON for structured data, CSV for flat files, or plain text for simple outputs. This affects how the data can be consumed in downstream applications.
  4. Default Value: json

Output Column

  1. Setting Name: outputColumn
  2. Input Type: Text Field
  3. Description: Specifies the name of the column where the extracted data will be stored in the dataset. Changes to this setting will redefine the output destination for the extracted information.
  4. Default Value: extracted

Batch Size

  1. Setting Name: batchSize
  2. Input Type: Numeric Input
  3. Description: Sets the number of records to be processed in a single AI call. A larger batch size may improve performance but can lead to reduced accuracy. Adjusting this value directly impacts the processing efficiency and potential cost of AI calls.
  4. Default Value: 25

Use Cases & Examples

Use Case 1: Extracting Customer Information

A retail company wants to enhance its customer database by extracting names, email addresses, and phone numbers from a series of customer feedback forms containing unstructured text responses.

Use Case 2: Financial Data Extraction

A financial services firm requires the extraction of monetary amounts, currency types, and dates from transaction records that are stored in text format.

Use Case 3: Custom Data Requirements

A legal firm needs to extract specific definitions and clauses from contracts written in plain text. The default extraction types do not meet their needs, so they use the custom prompt functionality.

Example Configuration

To address the retail company's need (Use Case 1), the TextExtractionNodeEditor can be configured as follows:

Configuration Sample

json
{
    "sourceColumn": "customerFeedback",
    "extractionType": "contacts",
    "outputColumn": "contactInfo",
    "customPrompt": "",
    "outputFormat": "json",
    "batchSize": 30
}

In this configuration, the node will extract names, emails, and phone numbers from the customerFeedback column, returning the structured data in a new column labeled contactInfo in JSON format. The batch size is set to 30 for efficient processing of the customer feedback records.

AI Integrations and Billing Impact

The TextExtractionNodeEditor leverages AI technologies for data extraction. Each use of the AI for text extraction may incur costs based on the service provider's billing structure, typically influenced by the number of rows processed and the complexity of the extraction task. It's crucial for users to monitor their usage and evaluate the cost implications associated with higher batch sizes or custom prompts, which may necessitate more complex AI processing.