5 min read

readFiles Documentation

Overview

This node is a node designed for the Vantage analytics and data platform that facilitates the retrieval of files from OneDrive. It allows users to list or search for files, obtaining metadata and download URLs of the files. It is particularly suited for integration with various data processing nodes, such as Image Analysis or PDF Extraction.

Purpose

The primary purpose of this node is to interface with OneDrive to either list files in a specified folder or search for files using a search query. It outputs a dataset containing essential file metadata, with the option to download the file's content in base64 format.

Settings

The readFiles node has a set of configurable options that dictate its behavior and the nature of the output. Below is a detailed explanation of each setting:

  1. fileType

    • Input Type: Dropdown
    • Description: Specifies the type of files to retrieve. Options include:
      • all: Retrieves files of any type.
      • images: Filters to retrieve only image files.
      • pdfs: Filters to retrieve only PDF files.
      • documents: Filters to retrieve documents including Word files and plain text.
    • Default Value: all
    • Effect of Changing: Selecting a different file type restricts the query to only include files matching that type, thus affecting the final output dataset.
  2. folderId

    • Input Type: String
    • Description: Denotes the specific OneDrive folder ID from which to retrieve files. If left empty, the root folder is queried.
    • Default Value: "" (empty string)
    • Effect of Changing: Specifying a folder will limit the results to files contained in that folder, while an empty value will retrieve files from the root directory.
  3. searchQuery

    • Input Type: String
    • Description: A free-text query that allows users to perform a search for files based on their names or metadata.
    • Default Value: "" (empty string)
    • Effect of Changing: Providing a search query activates the search API, limiting the retrieved files to those that match the search criteria.
  4. maxFiles

    • Input Type: Numeric
    • Description: Defines the maximum number of files to return in the output. This setting helps to manage response sizes and performance.
    • Default Value: 50
    • Effect of Changing: Adjusting this value changes the number of files retrieved, with a maximum limit set at 200 to prevent overwhelming the response size.
  5. includeContent

    • Input Type: Boolean
    • Description: Indicates whether the actual content of the files should be included in the output as base64-encoded strings.
    • Default Value: false
    • Effect of Changing: Setting this to true will append the file content in base64 format within the output, which may impact network performance and memory usage.

How It Works

  1. The readFiles node begins execution by validating necessary configurations, including the presence of the client ID required for integration with OneDrive.
  2. It establishes a connection to OneDrive using the credentials provided in the context.
  3. Depending on whether a search query is provided or if a folder ID is specified, it retrieves either a list of files in the folder or searches for files matching the query.
  4. After retrieving the files, it filters out non-file items (like folders) based on the fileType setting.
  5. If includeContent is set to true, the node fetches the binary content of the files in base64 encoding, which is suitable for further processing in analytics workflows.
  6. The final output is a structured dataset containing the necessary metadata of the files along with optional content.

Data Expectations

The readFiles node expects the following input structure in the configuration:

json
{
  "fileType": "all",         // Accepted values: all | images | pdfs | documents
  "folderId": "",            // String value: specific folder ID or empty for root
  "searchQuery": "",         // String value: free-text search query
  "maxFiles": 50,           // Numeric value: maximum files to retrieve
  "includeContent": false    // Boolean value: whether to include file content
}

Use Cases & Examples

Use Case 1: Research Document Retrieval

A research team needs to retrieve all PDF documents from a specific project folder in OneDrive for analysis.

Use Case 2: Image Processing Automation

An automation process requires fetching the latest images uploaded to a shared team folder for image analysis.

Use Case 3: Content Extraction from Text Files

A data analyst needs to extract data from various text files stored in a OneDrive directory to consolidate information for reporting.

Example Configuration

To retrieve the latest images from a shared folder:

json
{
  "fileType": "images",
  "folderId": "12345xyz",
  "searchQuery": "",
  "maxFiles": 20,
  "includeContent": true
}

In this configuration:

This setup ensures efficient retrieval of files tailored to the specific needs of the business use case.