4 min read

Exportdataset

Overview

This node is designed to facilitate the importation of tabular data from OneDrive, specifically allowing users to extract data from various file formats. Supported formats include CSV, TSV, JSON, Excel, SQLite, and SQL dumps. The logic is equipped to process single or multiple files and return structured datasets along with their metadata.

Purpose

The primary purpose of this node is to connect to OneDrive, retrieve specified files, and process the content into a usable format for data analytics and reporting. This makes it easier for users to harness data stored in OneDrive without needing manual downloads or file conversions.

Settings

This node has two configurable settings:

1. `fileId`

Input Type: String or Array of Strings
Description: This setting specifies the OneDrive item ID(s) of the files to be exported. It can accept a single ID or an array of IDs.
Impact of Change: Changing this value alters which files are fetched from OneDrive. If the ID(s) are incorrect or not provided, the logic will return an error.
Default Value: This setting does not have a default value; it is required for the logic to execute successfully.

2. `includeHeaders`

Input Type: Boolean
Description: This setting determines whether the first row of the dataset should be treated as headers (names of the columns). When set to true, the first row is included as part of the column definitions when outputting the data.
Impact of Change: Setting this to false will ignore the first row, resulting in raw data without explicit column names, which may complicate data interpretation.
Default Value: true

How It Works

This node operates by taking in specific inputs and configurations, establishing a connection to OneDrive, and processing the specified files according to their types:

Connection Setup: A connection to OneDrive is established using the the integration connection function, which takes the context as its parameter.
Input Validation: It checks whether the fileId is provided. If not, an error is returned indicating that the file ID is required. It also normalizes the input to ensure it handles both single and multiple file IDs effectively.
File Processing: For each file specified:
- The metadata is retrieved.
- Depending on the MIME type or file extension, it applies different parsing strategies:
  - CSV/TSV: Downloads the content, parses it as tabular data, and counts rows and columns.
  - Excel: Attempts to convert the file into CSV format before processing.
  - JSON: Parses the content into a JSON object.
  - SQLite and SQL dumps: Downloads the binary content and wraps it as structured data.
- If the dataset size exceeds a predefined threshold (defined by OFFLOAD_SIZE_THRESHOLD_BYTES), it indicates that offloading may be required.
Output: Depending on the number of processed files:
- If one file is processed, it returns the data and metadata in a structured format.
- If multiple files are processed, it returns an array of results along with success counts and file count.

Data Expectations

The logic expects:

Valid file IDs for OneDrive items. These IDs must be retrievable from the OneDrive service.
The files should be in one of the supported formats mentioned earlier (CSV, TSV, JSON, Excel, SQLite, SQL).
The presence of the necessary context for establishing a connection to OneDrive, including authentication credentials if required.

Use Cases & Examples

Use Case 1: Data Analysis

A data analyst may need to perform analysis on monthly sales reports stored in OneDrive in CSV format. The analyst can configure the exportDataset with the appropriate file ID to fetch this data for analysis.

Use Case 2: Report Generation

A business can automate the generation of quarterly financial reports by pulling the data directly from an Excel spreadsheet stored in OneDrive. By setting includeHeaders to true, the report will have column names accurately represented.

Use Case 3: Database Migration

A developer needs to migrate datasets stored as SQL dumps from OneDrive into a new database. The exportDataset can be configured to retrieve these SQL files, enabling easier migration processes.

Example Configuration

Here's an example configuration for a hypothetical use case where a finance team wants to pull data from multiple CSV files for analysis:

json

{
  "inputs": {
    "fileId": ["file-id-1", "file-id-2", "file-id-3"],
    "includeHeaders": true
  },
  "config": {
    "fileId": null,
    "includeHeaders": true
  },
  "context": {
    "isWeb": true
  }
}

In this configuration:

The finance team specifies IDs for three CSV files.
They choose to include headers, facilitating easier understanding of the data columns upon export.

← PreviousCSV / JSON Export Next →Notebook Write