AWS S3 Integration Documentation
Overview
The AwsS3 integration is a component within the Vantage analytics and data platform that provides seamless interaction with Amazon Web Services (AWS) Simple Storage Service (S3). This integration facilitates operations such as listing buckets, listing objects within a bucket, and retrieving object metadata, enabling users to manage and analyze data stored in S3 effectively.
Purpose
The primary purpose of the AwsS3 integration is to enable users to connect to their S3 account, execute various operations on their S3 buckets, and retrieve object metadata. This is essential for organizations that heavily rely on AWS S3 for data storage and need to analyze or interact with this data programmatically.
Settings
Below are the detailed settings available in the AwsS3 integration:
1. Access Key ID
- Input Type: String
- What It Does: The Access Key ID is a unique identifier associated with a user or IAM role in AWS. It is used to authenticate API requests made to AWS services.
- Default Value: None (this is a required setting).
2. Secret Access Key
- Input Type: String
- What It Does: The Secret Access Key works in conjunction with the Access Key ID to provide security and authenticate API requests. It should always be kept confidential.
- Default Value: None (this is a required setting).
3. Region
- Input Type: String / Dropdown
- What It Does: This setting specifies the AWS region where the S3 buckets exist. If set to "global", it defaults to 'us-east-1', as this is a commonly used region for S3. Changing the region alters the endpoint used to connect to S3, which can impact performance and latency based on geographic location.
- Default Value: 'us-east-1'.
How It Works
The AwsS3 integration works by using the AWS SDK for JavaScript. When initialized, it creates an instance of S3Client using the provided credentials (Access Key ID, Secret Access Key, and Region).
-
Client Initialization: The
getS3Clientmethod checks if thes3Clientalready exists; if not, it initializes it with the credentials and region. -
Authorization: The
authorizemethod is designed to handle the authorization process, although the AWS SDK manages most authorization tasks internally. -
Testing Connection: The
testConnectionmethod sends aListBucketsCommandto verify whether the integration can successfully communicate with AWS S3. -
Listing Buckets: The
listBucketsmethod returns a list of all S3 buckets associated with the account, including each bucket's name, creation date, and owner. -
Listing Objects: The
listObjectsmethod retrieves objects from a specified bucket, filtering by prefix and handling pagination through continuation tokens. -
Getting Object Info: The
getObjectInfomethod retrieves detailed metadata about a specific object within a specified bucket.
Data Expectations
The AwsS3 integration expects the following data:
- Credentials: Valid Access Key ID and Secret Access Key with proper permissions to interact with S3.
- Bucket Information:
- For listing objects: The name of the existing S3 bucket.
- For retrieving object info: The name of the bucket and the key of the object within that bucket.
Use Cases & Examples
Use Cases
-
Data Migration: An organization may need to migrate historical data from on-premises storage to AWS S3 for cost-effectiveness and scalability. They can use the integration to ensure the data is successfully uploaded and to verify its integrity.
-
Data Analysis: A data engineering team may want to analyze large datasets stored in S3. This integration allows them to programmatically list objects and retrieve specific data files for processing without manual effort.
-
Backup Verification: A company performing backups to S3 can use this integration to list and verify the contents of their backup buckets, ensuring data is appropriately backed up.
Example Configuration
Use Case: Data Analysis for a Large Dataset
Configuration Sample:
const awsS3Integration = new AwsS3({
access_key_id: 'AKIAxxxxxx', // Replace with your access key
secret_access_key: 'your_secret_access_key', // Replace with your secret key
region: 'us-west-2' // Specify the region where the bucket resides
});
// Listing all buckets
awsS3Integration.listBuckets().then(response => {
console.log(response);
}).catch(error => {
console.error(error);
});
// Listing objects in a specific bucket
const bucketName = 'my-data-bucket';
const prefix = 'datasets/';
awsS3Integration.listObjects({ bucket_name: bucketName, prefix: prefix }).then(response => {
console.log(response);
}).catch(error => {
console.error(error);
});
// Retrieving metadata for a specific object
const objectKey = 'datasets/my_large_file.csv';
awsS3Integration.getObjectInfo({ bucket_name: bucketName, object_key: objectKey }).then(response => {
console.log(response);
}).catch(error => {
console.error(error);
});AI Integrations
While the AwsS3 integration does not include direct AI functionalities, it serves as a foundational component that can be integrated with machine learning and analytics services. Users can retrieve data from S3, which can then be processed by AI models or analytics pipelines hosted within the Vantage platform.
Billing Impact
Using the AwsS3 integration itself does not incur additional costs within Vantage. However, users must remain aware of the AWS billing model for S3, which includes storage costs, data transfer costs, and request costs (e.g., API calls). Proper configuration and management of the integration can help minimize unnecessary expenses incurred from excessive API calls or data transfers.