Message Search Documentation
Overview
The messageSearch module is a powerful data transformation tool used within the Vantage analytics and data platform. Its primary purpose is to filter and search through messages based on flexible criteria such as keywords, regex patterns, sender information, channels, and date ranges. This logic does not integrate with any AI capabilities or make external API calls, focusing solely on processing the provided input data.
Settings
The messageSearch configuration includes several settings that determine how the search operates. Each setting is critical to tailoring the functionality of the search according to specific needs. Below is a detailed explanation of each setting:
1. searchColumns
- Input Type: Array of strings
- Description: Specifies which columns in the message data should be searched. By default, it searches within the "body" and "subject" of the messages. Changing this setting can affect which parts of the messages are included in the search, allowing for greater flexibility based on the data structure.
- Default Value:
["body", "subject"]
2. keyword
- Input Type: String
- Description: Contains the keyword or keywords used for the search, which are evaluated in a case-insensitive manner. If left empty, no keyword search will occur. Adjusting this value will impact how many results are returned, depending on whether the keyword(s) are present in the searchColumns.
- Default Value:
""(empty)
3. regexPattern
- Input Type: String
- Description: An optional regex pattern to use for filtering messages. If specified, it overrides the keyword search. A valid pattern will only allow messages that match the regex to be returned. Invalid patterns will log a warning but will not halt execution.
- Default Value:
""(empty)
4. channelFilter
- Input Type: String
- Description: Filters messages based on the communication channel (e.g., email, WhatsApp). Only messages that match this specified channel will be included in the results. Adjusting this setting allows users to narrow results to specific communication methods.
- Default Value:
""(empty)
5. channelColumn
- Input Type: String
- Description: Specifies which column contains the channel information within the data set. This setting allows users to customize the filtering process based on how their data is structured. If the default value is not appropriate for the data, changing this setting can yield the correct channel information for filtering.
- Default Value:
"message_channel"
6. senderFilter
- Input Type: String
- Description: Filters messages by the sender, where the input is evaluated for a substring match within the sender's information. Messages from senders that do not contain this specified substring are excluded. Modifying this setting directly impacts which senders' messages are returned.
- Default Value:
""(empty)
7. senderColumn
- Input Type: String
- Description: Indicates which column holds the sender information. This provides flexibility in adapting the component to different data schemas. Adjusting this value ensures that the correct sender column is referenced during the filtering process.
- Default Value:
"from"
8. dateColumn
- Input Type: String
- Description: This setting specifies the column that contains date values. The correct configuration of this column is crucial for date filtering to work effectively. Changing this value enables the logic to be compatible with various data structures.
- Default Value:
"date"
9. dateFrom
- Input Type: String (ISO date)
- Description: Allows users to set the starting point for date filtering. Only messages with dates equal to or after this ISO date will be returned. Leaving this empty means that no lower date boundary will be applied. Adjusting this value can narrow results within a specific time frame.
- Default Value:
""(empty)
10. dateTo
- Input Type: String (ISO date)
- Description: Sets the end point for date filtering, whereby only messages with dates equal to or before this ISO date will be returned. This, combined with
dateFrom, can pinpoint a specific range of messages. Leaving it empty means no upper date boundary is applied. - Default Value:
""(empty)
11. matchMode
- Input Type: String (dropdown)
- Description: Determines how keywords are matched against the searched texts. When set to
any, only one of the keywords needs to match to return a message. When set toall, all keywords must be found within the text for messages to be included. Adjusting this mode alters the strictness of keyword matching. - Default Value:
"any"
How It Works
The messageSearch logic processes an input dataset, filtering messages based on the specified configuration. It follows these steps:
- Input Data Validation: The logic checks if the input data exists and if it is an array. If there's no data, it returns an empty output.
- Keyword Processing: Keywords are split and trimmed, creating an array that is used for filtering.
- Regex Compilation: If a regex pattern is provided, it compiles the regex for use in filtering.
- Date Parsing: The
dateFromanddateTosettings are parsed into Date objects for comparison. - Filtering Logic: Each message in the input data is checked against the specified
searchColumns,channelFilter,senderFilter, and date constraints. - Return Results: Messages that meet the filtering criteria are collected and returned as output.
The processing is logged to the console at various stages for transparency in execution.
Use Cases & Examples
Use Case 1: Customer Support Analysis
A customer service department wants to analyze messages sent by their support agents on a specific communication platform (e.g., email). They need to filter messages containing specific keywords related to product issues.
Use Case 2: Compliance Monitoring
A compliance team needs to examine messages to ensure that all communications adhere to regulations. They want to filter by channel and date range, focusing on messages sent by specific employees.
Use Case 3: Marketing Campaign Review
A marketing team wishes to review all messages related to a recent campaign to ensure that team members followed up as planned. They want to filter out messages based on keywords and the campaign's specific timeframe.
Example Configuration
For the purpose of effectively monitoring compliance in a financial firm, the following configuration can be used:
{
"searchColumns": ["body", "subject"],
"keyword": "confidential, terms",
"channelFilter": "email",
"senderFilter": "john.doe",
"dateFrom": "2023-01-01T00:00:00Z",
"dateTo": "2023-12-31T23:59:59Z",
"matchMode": "all"
}Explanation of the Configuration:
- searchColumns: Searches both the body and subject of the messages.
- keyword: Looks for messages that contain both "confidential" and "terms."
- channelFilter: Filters to include only email messages.
- senderFilter: Only includes messages sent by John Doe.
- dateFrom: Starts filtering messages from January 1, 2023.
- dateTo: Ends filtering messages on December 31, 2023.
- matchMode: Ensures both keywords must be present in the messages returned.
This configuration helps the compliance team review only the relevant messages, facilitating easier monitoring of communication practices.