πŸš΄β€β™‚οΈData Lifecycle Management

Introduction

Welcome to TrusstGPT's data lifecycle management documentation. This guide will help you understand how data inputs and outputs are handled, stored, and managed within TrusstGPT, leveraging DynamoDB and Amazon Redshift Serverless.

Data Inputs

1. Audio Recordings

Audio recordings from customer conversations are ingested and processed using Trusst Lissten, a feature within TrusstGPT. These recordings are transcribed and translated before storage.

2. Transcripts and Text Data

Transcripts from various sources, such as call transcripts, verbatim feedback, survey results, emails, and live chat transcripts, are ingested into TrusstGPT.

3. Unstructured Documents

Unstructured documents, including claim forms and snail mail, are processed and converted into structured data for further analysis.

Data Storage

1. DynamoDB

DynamoDB is used for storing metadata and indexing information related to the data ingested by TrusstGPT. This allows for efficient querying and retrieval of data.

Configuration

  • Time to Live (TTL): To manage the lifecycle of data, we configure DynamoDB's TTL settings to automatically delete items after a specified period. This helps in managing storage costs and ensuring compliance with data retention policies.

    • Placeholder: DynamoDB Time to Live Configuration

  • Indexing: We utilise Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI) to enable efficient querying based on different attributes.

    • Placeholder: DynamoDB Indexing

2. Amazon Redshift Serverless

Amazon Redshift Serverless is used for storing large volumes of structured data and performing complex analytical queries. This is particularly useful for generating reports and insights from the ingested data.

Configuration

  • Data Retention and Automatic Backup: By default Amazon Redshift takes a snapshot about every eight hours or following every 5 GB per node of data changes, or whichever comes first. to ensure durability and availability.

Data Archival

For long-term storage, data can be archived to Amazon S3, ensuring that it is available when needed but not consuming expensive storage resources in DynamoDB or Redshift Serverless.

  • Placeholder: Amazon S3 Data Archival

Data Deletion

Data that is no longer needed can be permanently deleted from both DynamoDB and Redshift Serverless. This is managed through configurable data retention and TTL policies.

Conclusion

Effective data lifecycle management is crucial for ensuring data integrity, compliance, and cost management. By leveraging DynamoDB and Amazon Redshift Serverless, TrusstGPT provides robust solutions for storing, managing, and analysing data. For detailed configurations and more information, please refer to the respective AWS documentation linked throughout this guide.

For further assistance, please contact our support team via the Trusst Customer Support Portal.

Last updated