Data Lifecycle Management
Introduction
Welcome to TrusstGPT's data lifecycle management documentation. This guide will help you understand how data inputs and outputs are handled, stored, and managed within TrusstGPT, leveraging DynamoDB and Amazon Redshift Serverless.
Data Inputs
1. Audio Recordings
Audio recordings from customer conversations are ingested and processed using Trusst Lissten, a feature within TrusstGPT. These recordings are transcribed and translated before storage.
2. Transcripts and Text Data
Transcripts from various sources, such as call transcripts, verbatim feedback, survey results, emails, and live chat transcripts, are ingested into TrusstGPT.
3. Unstructured Documents
Unstructured documents, including claim forms and snail mail, are processed and converted into structured data for further analysis.
Data Storage
1. DynamoDB
DynamoDB is used for storing metadata and indexing information related to the data ingested by TrusstGPT. This allows for efficient querying and retrieval of data.
Configuration
Time to Live (TTL): To manage the lifecycle of data, we configure DynamoDB's TTL settings to automatically delete items after a specified period. This helps in managing storage costs and ensuring compliance with data retention policies.
Placeholder: DynamoDB Time to Live Configuration
Indexing: We utilise Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI) to enable efficient querying based on different attributes.
Placeholder: DynamoDB Indexing
2. Amazon Redshift Serverless
Amazon Redshift Serverless is used for storing large volumes of structured data and performing complex analytical queries. This is particularly useful for generating reports and insights from the ingested data.
Configuration
Data Retention and Automatic Backup: By default Amazon Redshift takes a snapshot about every eight hours or following every 5 GB per node of data changes, or whichever comes first. to ensure durability and availability.
Configuring Amazon Redshift Snapshots and Backups
Data Archival
For long-term storage, data can be archived to Amazon S3, ensuring that it is available when needed but not consuming expensive storage resources in DynamoDB or Redshift Serverless.
Placeholder: Amazon S3 Data Archival
Data Deletion
Data that is no longer needed can be permanently deleted from both DynamoDB and Redshift Serverless. This is managed through configurable data retention and TTL policies.
Conclusion
Effective data lifecycle management is crucial for ensuring data integrity, compliance, and cost management. By leveraging DynamoDB and Amazon Redshift Serverless, TrusstGPT provides robust solutions for storing, managing, and analysing data. For detailed configurations and more information, please refer to the respective AWS documentation linked throughout this guide.
For further assistance, please contact our support team via the Trusst Customer Support Portal.
Last updated